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1.  Introduction 


Because  of  the  number  of  practical  applications  [1]  of 
wideband  network  with  the  narrowband  Speech  Processor  (NSP), 
and  wide  and  easy  access  of  the  telephone  system,  a  great 
deal  of  interest  has  been  shown  in  interfacing  the  telephone 
line  to  the  wideband  network.  The  effect  of  this 
interfacing  on  the  narrowband  speech  processor,  however, 
must  be  carefully  considered. 

The  fact  that  all  narrowband  speech  processors  at 
present  are  based  upon  a  linear  all-pole  model  of  speech 
production  (for  voiced  speech)  is  important.  If  speech  is 
recorded  under  carefully  controlled  conditions,  i.e.,  very 
high  signal-to-noise  ratio,  minimal  room  reverberation, 
minimal  phase  distortion  and  a  high  quality  microphone,  then 
high  quality  synthetic  speech  can  be  generated  with  a 
carefully  implemented  simulation.  However,  it  has  been 
demonstrated  that  even  relatively  small  amounts  of  additive 
white  noise  cause  the  LP  coefficients  to  represent  wider 
bandwidth  resonances,  resulting  in  the  perception  of 
buzziness  in  the  synthesis.  Thus,  spectral  distortion  of 
the  input  signal  occurs.  Two  examples: 


I 
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1.  If  a  tone  and  a  speech  signal  having  roughly  equal 
powers  are  added,  a  human  listener  will  perceive  the 
resulting  interference  as  a  narrowband  whistle.  During 
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analysis  and  synthesis  several  of  the  NSP  linear 
prediction  coefficients  will  partially  represent  the 
high  amplitude  tone,  thus  reducing  the  accuracy  of  the 
speech  representation. 

2.  If  speech  is  bandpass  filtered  to  0. 3-3.0  kHz,  a 
listener  will  perceive  the  speech  as  ''hollow"  even 
though  no  gross  distortion  has  occurred.  Depending  on 
the  resistance  of  the  pitch  extraction  method  to  the 
loss  of  harmonic  structure  in  the  spectrum,  the 
synthesis  may  remain  unchanged,  or  it  may  become 
totally  unacceptable,  due  to  gross  errors  in  the  pitch 
period  tracking. 

Figure  1  is  a  set  of  block  diagrams  which  illustrates 
two  different  examples  of  interfacing  the  telephone  line  to 
the  wideband  network.  The  major  difference  between  these 
block  diagrams  is  the  location  of  the  hybrid,  which  is  used 
to  convert  the  2-wire  telephone  line  to  4-wire  telephone 
line.  In  the  configuration  of  Figure  1  (a) ,  the  hybrid  and 
digital  processing  unit,  which  includes  a  low  pass  filter, 
automatic  gain  control,  A/D,  D/A,  analyzer,  synthesizer, 
double  talker  detector  and  echo  canceller,  are  at  two 
different  locations.  This  separation  makes  the  problem  a 
little  more  difficult  for  the  echo  canceller.  Here  we  are 
interested  in  the  configuration  of  Figure  1(b),  where  the 
hybrid  and  the  digital  processor  are  at  the  same  location. 

The  block  diagram  of  Figure  1(b)  can  be  represented  in 


to  the  wideband  system. 


more  detail  as  shown  in  Figure  2.  The  user  at  site  A, 
without  any  local  site  equipment  except  for  the  telephone, 
wants  to  access  a  user  at  site  D.  The  sites  B  and  C  both 
have  a  complete  digital  speech  processing  system  and  also 
have  a  leased  line  channel  between  sites  B  and  C.  The  user 
at  site  A  might  be  motivated  to  dial  to  site  d  via  sites  B 
and  C  for  economic  and  efficiency  considerations  or  for 
security  considerations.  The  site  A  has  to  access  site  B 
via  a  local  or  a  long  distance  call  depending  on  the 
location  of  site  B.  This  applies  to  site  C  and  D  also. 
When  sites  A  and  D  get  access  to  sites  B  and  C  respectively, 
via  local  calls  as  shown  in  Figure  2,  then  first  the  user  of 
site  A  is  connected  to  site  B  by  a  2-wire  line  through  a 
local  office  of  site  A  and  site  B.  In  the  case  of  long 
distance  calls  the  site  A  is  connected  to  the  local  office 
of  site  A  by  a  2-wire  line,  then  the  local  office  of  site  A 
is  connected  to  the  site  B  local  office  by  a  "trunk"  which 
is  typically  a  4-wire  circuit.  At  the  site  B  local  office, 
the  4-wire  circuit  is  converted  to  a  2-wire  circuit  which  is 
then  connected  at  site  B.  In  any  case,  the  initial 
connection  at  site  B  is  a  voice  connecting  arrangement 
(VCA) .  This  device  mainly  protects  the  telephone  line  from 
any  customer  equipment.  Also,  it  performs  supervisory 
functions  such  as  ringing  detection,  off  hook  signaling  and 
automatic  seizing  of  the  line.  Next,  the  VCA  is  connected 
to  a  hybrid  which  converts  the  2-wire  to  the  4-wire 


SITE  B  SITE  C 

TRANSMISSION  CHANNEL  -* 

Fig.  2.  Block  Diagram  of  Telephone  Line  Interfacing  to  the  Wideband  £ 


connection,  which  in  turn  is  connected  to  the  digital 
processing  system.  This  system  contains  a  low  pass  filter 
for  anti-aliasing,  automatic  gain  control  to  compensate  for 
telephone  line  loss  and  maximize  the  signal  to  quantizing 
noise  ratio,  A/D  and  D/A,  analyzer,  synthesizer,  double 
talker  detector  and  echo  canceller. 

The  2-wire  to  4-wire  conversion  typically  results  in 
some  of  the  analog  output  of  the  synthesizer  being  fed  back 
to  the  analog  input  of  the  analyzer,  with  the  amount 
depending  on  how  well  the  hybrid  is  balanced.  This  leakage 
signal  results  in  an  echo.  At  site  C  the  process  is 
duplicated. 

There  are  four  major  sources  of  distortions  which 
affect  the  quality  of  the  narrowband  speech  processor: 

1)  The  spectral  distortion  due  to  the  carbon  button 
microphones,  inductive  loadings  on  2-wire  lines  and 
impedance  mismatch  throughout  the  system;  2)  the  signal 
loss,  hence  signal  to  noise  ratio  decreases,  which  is  a 
function  of  the  length  of  the  2-wire  line  and  the  way  the 
customer  uses  the  phone,  (for  example,  holding  the 
mouthpiece  far  away  from  the  mouth);  3)  the  echo  due  to  the 
mismatch  of  the  hybrid  balance  circuit  impedance  and  the 
impedance  of  the  2-wire  line;  and  4)  the  automatic  gain 
control  ( AGC) ,  which  is  required  to  compensate  for  the 
signal  loss  in  order  to  achieve  a  high  signal  to  quantizing 
noise  ratio  for  better  vocoder  speech  quality.  This 


automatic  gain  control  amplifies  not  only  the  signal,  but 
also  the  noise  and  the  echo.  If  the  gain  is  high,  it  might 
lead  the  whole  system  into  singing  (or  oscillation) . 

The  first  source  of  distortion  (i.e.  spectral 
distortion)  requires  the  telephone  channel  equalization  to 
improve  the  synthesizer's  speech  quality.  This  equalization 
is  nontrivial  and  unique  for  the  following  reasons: 

1.  Equalization  must  precede  the  nonlinear  NSP.  For 
example,  it  is  not  possible  to  process  the  incoming 
speech  with  the  NSP  and  then  post-process  the  resulting 
coefficients  with  an  equalization  filter. 

2.  Equalization  can  only  occur  over  the  frequency 
range  where  severe  attenuation  has  not  been  introduced. 
If  a  bandpass  filter  from  0. 3-3.0  kHz  has  been 
introduced  with  30  dB  attentuation  in  the  stopbands, 
practical  considerations  such  as  system  noise  and 
integer  quantization  noise  from  A/D  conversion  preclude 
equalizing  the  stopband  regions. 

3.  Sophisticated  equalization  techniques  used  for 
digital  modems  will  not  work  for  voice  transmission 
equalization.  The  high-speed  digital  modem  has  an 
adaptive  equalization  capability;  a  series  of  reference 
pulses  is  initially  transmitted  to  the  far-end  modem, 
which  then  adaptively  updates  the  taps  or  coefficients 
of  an  inverse  filter  or  transversal  filter  until  the 
mean  square  error  between  the  reference  pulses  and  an 
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internal  standard  is  minimized.  The  obvious  difficulty 
with  adaptive  equalization  where  voice  is  involved  is 
that  there  is  no  standard  for  comparison. 

The  above  telephone  channel  equalization  problem  was 
addressed  by  J.D.  Marked  and  Steven  B.  Davis  at  Signal 
Technology,  Inc.  [2].  They  showed  that  the  channel 
equalization  can  be  performed  by  using  either  a  prior 
long-term  spectral  characteristic  of  the  speaker,  or  the 
population,  as  a  reference  signal.  Also,  it  was 
demonstrated  that  the  equalized  vocoded  telephone  speech  is 
preferred  by  listeners  over  nonequalized  vocoded  telephone 
speech.  Furthermore,  listener  acceptance  of  equalized 
telephone  speech  improves  very  nearly  to  the  acceptance 
level  of  the  reference  non-telephone  band  limited  speech  by 
using  speaker  dependent  equalization.  Listener  preference 
for  speaker  independent  (population  dependent)  equalization 
is  slightly  lower. 

Here  we  discuss  the  next  three  problems — signal  loss, 
echo,  and  AGC  in  the  loop. 

The  echo  in  the  telephone  line  is  a  very  old  problem. 
A  great  deal  of  work  has  been  done  in  developing  methods  for 
echo  cancellation.  These  methods  range  from  the  simple 
introduction  of  a  physical  loss  in  the  telephone  line  to  the 
use  of  adaptive  digital  filters  which  adapt  to  the  telephone 
network.  It  has  been  shown  that  the  adaptive  digital 
filters  are  very  effective  in  satellite  telephone  networks 
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where  the  round  trip  could  be  anywhere  from  540  ms  to  1200 
ms  [3]. 

In  all  the  previous  studies  the  vocoder  was  not  present 
in  the  loop.  Hence,  the  effect  of  echo  and  other  telephone 
distortions  combined  with  the  vocoder  are  not  known.  Also, 
it  is  important  to  know  how  the  performance  of  an  adaptive 
digital  filter  is  affected  by  the  presence  of  the  vocoder. 
A  new  adaptive  filtering  algorithm  for  the  echo  cancellation 
in  the  frequency  domain  was  developed  and  its  performance 
will  be  compared  to  some  existing  algorithms.  The  existing 
algorithms  used  were  the  Widrow  LMS  algorithm  and  the 
Gradient  Lattice  algorithm.  The  Widrow  LMS  algorithm  was 
used  because  of  its  simplicity  and  robust  performance  (which 
has  been  shown  for  many  digital  signal  processing 
applications  including  echo  cancellation) .  The  Gradient 
Lattice  was  chosen  because  of  its  orthogonality  property 
which  is  claimed  to  result  in  faster  convergence. 

The  next  important  point  to  be  considered  is  the  effect 
of  signal  loss  on  the  narrowband  speech  processor.  The 
signal  to  noise  echo  requirement  for  the  satisfactory 
quality  of  synthetic  speech  will  be  defined.  Two  kinds  of 
noise  will  be  considered  here.  The  first  is  the  telephone 
channel  (2-wire)  noise  and  the  second  is  the  quantization 
noise  due  to  analog  to  digital  conversion.  This  surely  will 
play  an  important  roll  in  defining  the  need  or  lack  of  need 
for  the  AGO .  Even  after  the  need  of  an  AGC  is  defined,  the 
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AGC  issue  must  be  considered  very  carefully.  While  the  AGC 
helps  to  minimize  the  problem  of  signal  to  quantization 
noise  it  also  introduces  other  severe  problems.  A  few  of 
these  problems  are  as  listed  below: 

1.  With  the  AGC  in  the  loop  the  adaptive  filter  has  to 
keep  track  of  AGC  along  with  the  transhybrid  response 
of  the  hybrid. 

2.  The  present  double  talker  detector  algorithm  would 
not  work. 

3.  The  AGC  not  only  amplifies  the  signal,  it  also 
amplifies  the  telephone  channel  noise  and  echo. 

4.  If  the  AGC  gain  becomes  too  large  the  whole  system 
will  go  into  singing  (oscillation) . 

These  problems  related  to  the  AGC  could  be  partially  solved 
by  feeding  the  AGC  information  into  the  double  talker 
detector  (DTK)  and  adaptive  digital  filter  ( ADF)  and  making 
AGC  slowly  varying.  "Slowly  varying"  is  a  relative  term 
which  needs  to  be  defined  for  our  system. 

The  overall  study  was  performed  via  a  digital 
simulation  of  the  system  shown  in  Fig  2.  The  digital 
simulation  was  carried  out  on  the  STI  VAX-ll/780  computer. 
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2 .  The  Test-bed  S imulation 

2.1  Introduction 

Shown  in  Fig.  1  is  a  typical  analog  telephone 

connection  [1] .  The  near  end  is  connected  through  a 

two-wire  subscriber  loop  (or  local  loop)  into  the  local 
switch.  The  local  switch  attaches  to  a  toll  connect  trunk 
which  is  a  two-wire  connection  into  an  hybrid.  The  hybrid 
then  separates  the  connection  into  a  four-wire  transmit  and 
receive  path,  and  the  toll  switch  connects  the  near  and 
far-end  trunks  for  bi-directional  transmission  (full 
duplex) .  The  reason  for  the  four-wire  transmission  is  so 
that  gain  can  be  added  to  compensate  for  physical  wire  loss 
versus  length.  At  the  far  end  the  reverse  operations  are 
performed  to  the  subscriber  loop.  For  the  Wideband 

Integrated  Network  (WIN)  there  is  a  combined  analog  and 
digital  network  as  illustrated  in  Fig.  2.  The  same  basic 
block  diagram  holds  true  except  that  the  hybrid  now  is 

connected  to  A/D  and  D/A  converters  so  that  the  four-wire 
transmission  line  is  digital  instead  of  analog.  However, 
this  transformation  from  analog  to  a  digital  line  has  severe 
effects  on  the  operation  of  the  system. 

In  Fig.  2  the  full  duplex  speech  processor  includes  an 
LPC  speech  analyzer  and  synthesizer.  The  telephone  signal 


conditioner  (TSC)  block  includes  those  elements  in  the 
system  needed  to  compensate  for  echo  introduced  by  the 
hybrid  and  necessary  conditioning  algorithms  to  enhance  the 
speech  quality  of  the  synthesized  speech  output.  To  model 
the  effects  of  the  full-duplex  network  on  the  LPC 
analyzer/synthesizer  a  full  test-bed  simulation  of  the 
system  has  been  developed.  The  main  purpose  of  the 
simulation  was  to  check  the  effectiveness  of  different 
conditioning  algorithms  on  the  over-all  end-to-end 
full-duplex  operation. 

In  this  section  we  will  first  examine  the 
interconnection  problems  between  an  analog  network,  the 
switch  telephone  network  (STN)  ,  and  the  digital  network 
(WIN) .  Then  we  will  present  an  analog/digital  simulation  of 
the  network  which  includes  those  algorithms  which  are 
necessary  for  the  system  to  work  in  a  satisfactory 
condition. 

2.2  Interconnection  Problems  with  STN-WIN 

To  investigate  the  problems  that  arise  from  such 
interconnection  we  follow  the  block  diagram  of  Fig.  3.  The 
near-end  speech  input  passes  first  through  the  telephone 
mouthpiece  which  introduces  some  distortion  [2] .  Then,  the 
speech  input  passes  through  the  physical  two-wire  line, 
which  introduces  a  loss  due  to  the  distance  between  the 


Fig.  3.  Block  diagram  of  interfacing  the  telephone  line 
to  a  wideband  system 
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calling  party  (the  two-wire  portion  only)  and  the  location 
of  the  hybrid.  This  loss  is  a  very  significant  problem  as 
it  can  range  from  (a  reference  of)  0  dB  to  as  much  as  20  dB 
C 1 3 .  That  means  that  either  we  improve  the  LPC  analyzer  to 
perform  well  for  a  large  dynamic  range  or  we  add  an 
Automatic  Gain  Control  (AGC)  in  the  loop  to  improve  the 
signal-to-quantization  noise  at  the  LPC  analyzer  input. 

Another  problem  is  caused  by  the  hybrid  which  is 
necessary  for  converting  the  standard  two-wire  line  input  to 
the  four-wire  system  (two  wires  to  the  analyzer,  two  wires 
from  the  synthesizer) .  Due  to  the  fact  that  the  impedance 
balance  of  the  hybrid  is  a  strict  function  of  the  loading, 
and  that  the  loading  is  a  function  of  the  distance  from  the 
hybrid  to  the  calling  party,  this  imbalance  introduces  the 
most  difficult  problem  in  the  full-duplex  network,  the 
"echo"  problem.  In  general,  the  hybrid  impedance  is  not 
perfectly  matched  and  a  return  signal  from  the  far-end 
speaker  would  be  fed  back  through  the  upper  loop  to  the 
far-end  speaker.  This  return  signal  is  the  "echo". 

The  inherent  impedance  mismatches  are  a  function  of  the 
calling  party's  physical  two-wire  line  length  to  the  hybrid. 
This  impedance  mismatch  causes  an  echoed  signal  with 
relative  level  of  -30  dB  at  best  to  -6  dB  at  worst  [3] .  To 
understand  the  problems  caused  by  the  hybrid  in  this  special 
application,  we  have  to  remember  that  the  LPC  analyzer  needs 
a  high  signal-to-quantization  noise  to  achieve  acceptable 
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speech  quality.  Now,  if  we  assume  that  the  LPC  analyzer 
requires  a  0  dB  reference  for  acceptable  signal  to 
quantization  noise  ratio,  if  both  near-end  and  far-end  have 
a  10  dB  loss  in  the  two-wire  lines,  and  if  the  hybrids  each 
have  10  dB  rejection,  then  there  is  a  state  of  sustained 
oscillation  (unity  loop  gain  or  0  dB  "singing  margin"  [4]) 
if  we  use  a  standard  automatic  qain  control. 

Another  problem  is  caused  by  the  interconnection  of  the 
hybrid  with  the  LPC  analyzer.  For  long  delays  in  the 
four-wire  line,  the  effect  of  the  echo,  without  the  LPC 
analyzer,  is  very  annoying  and  is  handled  generally  by 
adaptive  filter  algorithms  to  cancel  its  effect.  With  the 
LPC  analyzer  in  the  loop,  the  effect  of  the  echo  becomes 
even  worse  because  of  the  inherent  nonlinearity  in  the  LPC 
analyzer.  The  problem  arises  particularly  during  periods 
when  both  speakers  at  the  two  ends  talk  at  the  same  time. 
In  these  cases  the  input  to  the  LPC  analyzer  contains  the 
near-end  speaker  input  and  the  echo  from  the  far-end 
speaker,  and  because  of  the  nonlinearity  of  the  LPC  analyzer 
the  result  is  highly  distorted  speech  at  the  synthesizer 
output.  We  see  from  the  discussion  so  far  that  a  fast 
convergence  echo  cancelling  algorithm  is  needed  in  our 
application  to  minimize  the  distortion  introduced  by  the  LPC 
analyzer . 

Another  issue  in  the  full-duplex  network  is  the 
double-talker  detection  algorithm.  The  function  of  this 
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algorithm  is  to  detect  the  presence  of  near  end  talker  voice 
input,  so  that  adaptation  in  the  adaptive  filter  can  be 
halted  during  such  situations  to  avoid  adapting  in  the  wrong 
direction.  Basically  the  double  talker  measures  the  energy 
at  both  points  at  the  four-wire  side  of  the  hybrid.  When 
the  energy  ratio  between  the  four-wire  transmit  and  receive 
side  exceeds  a  certain  threshold,  the  algorithm  raises  a 
flag,  meaning  that  the  near-end  talker  is  active. 

Another  issue  to  consider  is  the  effect  of  the  STN  on 
the  input  speech  signal.  Specifically  the  STN  bandpasses 
the  speech,  resulting  in  a  range  of  300-3200  Hz. 
Furthermore,  the  pass  band  magnitude  characteristic  may  have 
10-15  dB  of  ripple  as  a  function  of  the  input  signal  energy. 
This  combination  of  channel  distortions  and  band  pass 
filtering  affects  the  quality  of  the  LPC  synthesized  speech, 
severely  affecting  the  pitch  and  voice/unvoice  algorithms. 
Therefore,  it  is  necessary  to  improve  the  existing  LPC 
algorithms  to  handle  such  speech  signals. 

All  the  issues  discussed  so  far  are  investigated  in 
this  report  in  terms  of  algorithms  for  the  telephone  signal 
conditioner  (TSC) .  The  algorithms  implemented  in  the 
test-bed  simulation  try  to  solve  the  problems  discussed  so 
far.  The  effectiveness  of  those  algorithms  are  checked 
based  on  the  output  speech  quality  obtained  by  end-to-end 
simulations. 

In  the  next  subsection  the  over-all  test-bed  simulation 
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is  presented. 

2.3  The  Over-all  Test-bed  Simulation 

In  this  section  we  present  the  over-all  test-bed 
simulation  program.  The  main  purpose  of  this  program  is  to 
check  the  interaction  between  different  algorithms  needed  in 
the  full-duplex  communication  network  with  vocoder  in  the 
loop.  The  test-bed  program  is  written  in  a  modular  method 
with  a  number  of  "software  switches"  which  can  be  used  to 
simulate  different  configurations.  This  flexibility  allows 
us  to  check  the  interaction  between  the  algorithms  and 
isolate  and  identify  the  problems  caused  by  such 
interaction. 

Fig.  3  presents  a  possible  block  diagram  of  a 
full-duplex  communication  network  with  vocoder  in  the  loop. 
The  user  at  site  A  is  connected  to  site  B  by  a  2-wire  line 
through  a  local  office.  The  initial  connection  at  site  B  is 
a  voice  connection  arrangement  (VCA) .  This  device  mainly 
protects  the  telephone  line  from  any  customer  equipment. 
Also,  it  performs  supervisory  functions  such  as  a  ringing 
detection,  off-hook  signaling  and  automatic  seizing  of  the 
line.  Next  the  VCA  is  connected  to  a  hybrid  which  converts 
the  two-wire  line  to  the  4-wire  connection,  which  in  turn  is 
connected  to  the  digital  processing  system.  This  system 
contains  a  low  pass  filter  for  anti-aliasing,  automatic  gain 
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control  to  compensate  for  telephone  line  loss  and  maximize 
the  signal  to  quantization  noise  ratio,  A/D  and  D/A, 
analyzer,  synthesizer,  double  talker  detector  and  echo 
canceller.  Such  a  block  diagram  could  be  simulated  entirely 
by  a  digital  simulator.  However,  we  prefered  to  divide  the 
system  into  two  parts:  1)  The  local  telephone  line  from 
customer  mouthpiece  to  output  of  VCA;  and  2)  The  wideband 
system  from  the  2-wire  input/output  part  of  the  hybrid  at 
site  B  to  site  C.  We  used  the  physically  available 
telephone  line  for  part  1,  and  performed  the  digital 
simulation  of  part  2. 

2.3.1  Local  Telephone  Line  Simulation 

The  input  speech  data  for  digital  simulation,  which  is 
the  speaker's  speech  over  the  telephone  line  and  the  output 
of  the  VCA,  is  collected  using  the  special  setup  as  shown  in 
Fig.  4  and  discussed  here.  Site  A  and  site  D  called  the 
sites  B  and  C,  respectively,  through  a  local  office.  The 
output  of  the  VCA  at  sites  B  and  C  were  recorded 
simultaneously  on  a  two-channel  tape  recorder  for  future 
digitization.  In  order  to  have  natural  conversation  between 
the  two  speakers,  the  output  of  each  VCA  was  also  connected 
to  the  earphone  of  the  first  telephone  handset.  Because  of 
the  set-up  in  Fig.  4,  the  length  of  line  1  and  line  2  were 
equal.  However,  in  general  this  may  not  be  the  case.  Also, 
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the  conversation  was  not  fully  realistic  as  it  did  not 
include  the  psychological  effects  of  a  satellite  delay  and 
an  echo  problem. 

2.3.2  Wideband  System  Simulation 

For  the  wideband  system  we  simulated  digitally  the 
following  components: 

a)  Hybrid 

In  the  hybrid  the  response  from  2-wire  to  4-wire  and 
vice  versa  is  not  important,  and  can  be  approximated  by  flat 
unity  gain.  However,  the  response  from  4-wire  receive  side 
to  4-wire  transmitter  side,  which  is  the  function  of  2-wire 
impedence  is  important.  This  response  is  called  the 
transhybrid  response.  The  term  transhybrid  loss  is  also 
used.  The  transhybrid  loss  represents  the  average  loss  over 
the  frequency.  In  other  words,  if  the  transhybrid  loss  is 
10  dB,  then  the  echo  signal  at  the  4-wire  transmitter  side 
is  only  10  dB  lower  than  the  signal  at  the  4-wire  received 
side.  Previous  studies  have  shown  that  the  average 
transhybrid  loss  could  be  as  low  as  6  dB  in  the  worst  cases 
[3] .  In  our  case,  not  only  the  magnitude  of  average 
transhybrid  loss,  which  determines  the  strength  of  the  echo 
and  hence  the  perceptual  effect,  is  important,  but  also  the 
exact  response  is  important  in  order  to  evaluate  the 
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performance  and  requirement  of  the  echo  canceller. 

In  the  test-bed  simulation,  the  transhybrid  response  is 
simulated  by  a  64  weight  FIR  filter.  These  weights  are 
measured  under  real  telephone  lines  and  from  an  artificial 
telephone  line  where  the  conditions  of  loading  and  line 
lengths  are  more  controlled.  From  these  measurements  a 
library  of  14  different  impulse  responses  has  been 
collected.  The  user  has  the  option  to  choose  one  of  them, 
or  by  using  a  software  switch  change  the  hybrid  response 
dynamically  for  each  fixed  amount  of  time  chosen  by  the 
user.  In  this  way  a  more  realistic  time  variable 
transhybrid  response  can  be  simulated.  Details  on  the 
measurements  of  the  transhybrid  responses  and  their  results 
are  given  in  Chapter  3. 

b)  Echo  canceller 

The  echo  canceller  algorithm  is  an  essential  part  of 
the  wideband  system.  Because  of  the  inherent  nonlinearity 
of  the  LPC  analyzer  a  fast  convergent  algorithm  is 
necessary. 

The  Test-bed  simulation  includes  four  different 
algorithms  that  can  be  chosen  separately  by  the  user:  1)  The 
Widrow  LMS  algorithm  [5] ,  2)  The  normalized  Widrow  LMS 
algorithm  [6] ,  3)  The  gradient  lattice  adaptive  filter  [7] 
and  4)  The  unconstrained  frequency  domain  LMS  algorithm 
(UFLMS) [8] . 


The  Widrow  algorithm  was  inserted  in  the  test-bed 
Simulation  for  comparison  purposes,  since  it  is  the  most 
popular  algorithm  in  echo  cancelling  [9] .  The  gradient 
lattice  algorithm  has  been  implemented  because  of  its 
potential  fast  convergence  reported  in  the  literature  [7] , 
which  happens  to  be  incorrect  for  our  application.  The 
unconstrained  frequency  domain  LMS  algorithm  was  chosen  for 
its  efficient  implementation  and  fast  convergence  compared 
to  the  time  domain  LMS  algorithm.  The  SHARP  algorithm  was 
chosen  for  its  simple  implementation  since  the  filter  model 
is  a  recursive  filter  which  can  have  a  low  order  compared  to 
the  FIR  model.  Since  the  performance  of  the  SHARF  algorithm 
was  very  poor  compared  to  the  Widrow  algorithm  we  will  not 
elaborate  on  this  algorithm  in  this  report,  and  the 
algorithm  is  not  included  in  the  final  test-bed  simulation. 

All  the  parameters  needed  to  control  the  different 
adaptive  filter  algorithms,  such  as  filter  orders  and 
convergence  constants,  can  be  chosen  by  the  user  during  the 
initialization  phase  of  the  simulation  program.  A  complete 
description  of  those  algorithms  is  given  in  Chapter  4. 

c)  Double  talker  detector 

The  main  function  of  the  dcubie  talker  detector  is  to 
detect  the  near-end  talking  and  tell  the  echo  canceller  not 
to  adapt  when  the  near-end  speaker  is  talking. 

If  a  double  talker  is  not  present,  the  adaptive 
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algorithm  will  drift  to  the  wrong  direction  daring  double 
talking,  which  gives  poor  echo  redaction,  and  may  distort 
the  signal,  depending  on  the  amount  of  drift  present  [10] . 

In  the  simulation  program  we  have  the  option  to  run  the 
experiment  with  or  without  the  double  talker.  In  the 
simulation  program  we  have  two  different  double  talker 
algorithms.  One  is  for  a  time  domain  algorithm  in  which  the 
decision  is  made  on  a  point-by-point  basis,  the  other  is  for 
the  frequency  domain  algorithm  where  the  decision  is  made  on 
a  block-by-block  basis. 
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e)  Channel  simulation 

The  channel  delay  of  the  digital  4-wire  transmission 
line  is  simulated  by  a  programmable  digital  delay.  The 
length  of  the  delay  can  be  controlled  by  the  user.  In  this 
study  we  assume  that  we  are  dealing  with  an  ideal  digital 
transmission  line  and  no  errors  are  introduced  by  the 
channel. 

f )  AGC  simulation 

At  first  look,  an  AGC  is  intuitively  very  attractive  as 
a  solution  to  the  dynamic  range  problem  at  the  LPC  analyzer 


input.  However,  after  a 

very 

careful 

examination 

of 

its 

interaction  with 

other  algorithms. 

we  arrive 

at 

the 

conclusion  that  the 

AGC 

that 

can  be 

introduced 

in 

the 

full-duplex  network,  will  be  very  complex  and  moreover  will 
limit  the  echo  canceller  performance  to  an  unacceptable 
level. 

For  that  reason,  an  AGC  is  not  included  in  the  test-bed 
simulation.  Instead,  we  chose  the  improved  analyzer  as  a 
solution  to  the  dynamic  range  problem.  A  full  examination 
of  the  AGC  is  given  in  Chapter  5. 
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3 .  Hybrid  Simulation 

The  hybrid  is  an  analog  three  port  device  with  one 
input/output  at  the  2-wire  line,  and  the  receive  input  and 
the  transmit  output  at  the  4-wire  line  [Fig.  5].  An  exact 
simulation  of  the  hybrid  is  very  tedious  and  complex  because 
of  its  multiple  input/output  and  its  inherent 
nonlinearities.  As  a  first-order  approximation,  the  hybrid 
responses  from  the  2-wire  line  to  the  4-wire  line  and 
vice-versa  can  be  approximated  by  a  flat  unity  gain.  The 
standard  simulation  of  the  transhybrid  response,  from  the 
4-wire  line  receive  to  transmit  line,  is  an  FIR  filter. 

In  this  section  we  present  transhybrid  response 
measurements  made  on  physical  telephone  lines  connected  to  a 
physical  hybrid.  Since  in  using  a  physical  telephone  we  did 
control  the  loading  conditions  of  the  hybrid,  the  same 
measurements  were  repeated  using  an  artificial  telephone 
line,  where  loading  conditions  were  totally  controlled. 

The  transhybrid  measurements  were  made  using  standard 
system  identification  techniques  under  linear  assumption. 
To  check  the  accuracy  of  this  assumption,  a  new  nonlinear 
frequency  domain  adaptive  filter  has  been  used  to  check  the 
second-order  nonlinearity  in  the  hybrid. 
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The  use  of  the  precisely  balanced  transformer  windings 
to  obtain  conjugacy  between  transmission  paths  results  in  the 
so-called  hybrid  circuits.  These  can  be  realized  with  a  single 
transformer  structure,  but  the  impedance  levels  required  are 
usually  inconvenient-  The  more  common  realization  uses  two 
transformers  connected  as  shown  by  the  simplified  diagram  of 
Fig.  2-4.  Transformers  T. ,  and  T2  each  consist  of  at  least  three 
tightly  coupled  windings. 

If  Z. =Z_  and  Z3=Z.,  a  proper  choice  of  turns  ratios  will 
make  port1!  conjugate  to  port  2,  and  port  3  conjugate  to  port  4. 
That  is,  if  Z^  is  a  source  delivering  power  to  port  1,  a  negli¬ 
gible  part  ofthis  power  will  be  received  by  impedance  Z2  and 
vice  versa.  Power  flowing  into  the  circuit  at  either  port  1 
or  port  2  will  be  delivered  to  impedances  Z3  and  Z4  equally. 

In  one  practical  application,  Z,  is  a  bilateral  two-wire 
line,  and  Z,  is  a  fixed  network  whose  only  function  is  to  match 
Z,  and  provide  the  necessary  conjugacy.  Impedances  Z.  and  Z2 
represent  a  four-wire  line  using  separate  pairs  for  tne  two 
directions  of  transmission.  The  terms  trans-hybrid  loss  and 
through-balance  are  used  to  describe  the  effectiveness  of  this 
circuit.  Losses  of  50  dB  between  impedances  Z.  and  Z2  are 
realizable.  In  central  offices  where  Z3  is  dir ferent  ror  every 
call  that  is  set  up,  much  lower  values  are  common. 


Fig.  5.  Hybrid  circuits  using  two  transformers  and  its 
description.  (From  Transmission  Systems  for 
Communication ,  Bell  Telephone  Laboratories) 
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3.1  Transhybrid  Measurements  under  Linear  Assumption 

To  measure  the  transhybrid  response  of  the  hybrid  a 
standard  system  identification  set-up  has  been  built.  A 
user  at  site  A  called  to  site  B,  and  the  hybrid  at  site  B 
was  connected  to  the  phone  line  through  a  VCA  (Voice 
Connection  Arrangement)  as  shown  in  Fig.  6.  Bandlimited 
white  noise  was  used  as  the  input  to  the  4-wire  receive  side 
of  the  hybrid.  The  user  at  site  A  was  silent  to  insure  that 
the  signal  received  at  the  4-wire  transmit  side  was  only  the 
leakage  signal.  The  input  and  output  signals  were  digitized 
simultaneously  by  a  2-channel  A/D.  Now  the  problem  reduces 
to  a  system  identification  problem,  where  the  input  and  the 
output  of  a  black  box  is  known  and  the  problem  is  to  find 
the  transfer  function  of  the  black  box. 

For  system  identification  we  used  three  different 
algorithms:  the  standard  Widrow  LMS  algorithm,  a  new 
unconstrained  frequency  domain  algorithm  for  FIR  filter 
representation,  and  one  which  uses  the  sequential  regression 
algorithm  for  HR  filter  representation.  We  ran  these 
algorithms  on  the  data  collected  from  the  set-up  described 
above.  The  output  of  each  algorithm  was  the  identified 
impulse  response  for  the  FIR  algorithms  or  the  transfer 
function  for  the  IIR  algorithm.  Another  result  from  the 
algorithms  was  the  SNR  between  the  desired  response  signal 
and  the  error  obtained  by  the  adaptive  filters.  Since  the 
SNR  achieved  by  the  IIR  model  was  poor  compared  to  the  FIR 
model  we  continued  our  measurements  on  the  FIR  model  only. 


Site  B 


White 

noise 
Site  A 


White 

noise 
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Details  on  the  algorithms  used  in  those  measurements  will  be 
given  with  description  of  the  echo  cancelling  algorithms. 

The  problem  in  using  the  physical  telephone  for  the 
measurements  was  that  no  information  was  available  about  the 
length  of  the  line,  i.e.,  loaded  or  nonloaded  etc. 
Therefore,  a  similar  experiment  was  performed  under 
controlled  conditions.  The  black  box  in  Fig.  6  was  replaced 
by  the  black  box  in  Fig.  7,  where  an  artificial  telephone 
line  was  used  for  simulating  different  length  telephone 
lines.  A.  H88  and  D66  loading  were  used  in  the  loaded  lines. 

An  example  of  the  transhybrid  response  obtained  from 
the  physical  telephone  lines  is  given  in  Fig.  8.  Fig.  8a 
represents  the  transhybrid  response  for  an  unloaded  line, 
and  Fig.  8b  the  transhybrid  response  for  an  actual  loaded 
phone  line.  In  Fig.  9  we  have  a  sample  of  transhybrid 
responses  under  different  conditions.  Fig.  9a  and  9b  are 
samples  of  unloaded  artificial  line  with  5000'  and  15,000' 
respectively.  Fig.  9c  and  9d  present  the  transhybrid 
response  with  different  length  and  different  loading. 

Since  with  the  artificial  telephone  line  we  can  control 
both  the  length  of  the  lines  and  the  loading  conditions,  we 
use  it  to  build  a  library  of  different  transhybrid 
responses.  The  library  includes  14  different  responses 
under  different  conditions.  The  library  is  built  in  a 
format  such  that  from  the  test-bed  simulation  program  any 
one  of  the  fourteen  responses  can  be  chosen  to  simulate  the 
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Fig.  8.  Transhybrid  response  with  the  actual  telephone  lines. 
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Fig.  9.  Transhybrid  response  with  the  simulated  telephone  lines. 
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transhybrid  response.  The  user  can  specify  different 
transhybrid  responses  for  the  near-end  and  far-end  hybrids. 
More  than  that,  the  user  by  a  software  "switch"  can 
dynamically  change  the  transhybrid  response  during  the 
simulation,  and  the  timing  step  for  those  changes  can  be 
also  controlled  by  the  user.  For  example  the  user  can 
specify  that  every  1  sec  the  transhybrid  response  will  be 
changed  by  another  response  from  the  library. 

In  system  identification  techniques  many  measures  are 
used  in  checking  their  performances.  One  such  measure  is 
the  signal-to-error  ratio,  which  is  the  ratio  between  the 
mean  squares  output  of  the  system  to  the  mean  squares  error 
between  the  system  and  its  model,  when  both  the  system  and 
its  model  are  driven  by  the  same  white  noise  input.  For  a 
64  weight  FIR  filter  the  signal-to-error  ratio  was  around 
25-28  dB;  a  longer  FIR  filter  did  not  achieve  better 
results,  and  with  32  weights  the  signal-to-error  ratio  was 
lower  by  one  to  two  dB's.  Since  we  use  a  12-bit  A/D  in  our 
simulation,  it  means  that  under  linear  assumption,  the 
maximum  echo  reduction  that  we  can  achieve  is  about 
25-28  dB. 

In  the  next  section  we  present  the  measurement  of  the 
transhybrid  response  under  nonlinear  conditions.  Our  aim 
was  to  check  the  possibility  of  simulating  a  more  accurate 
transhybrid  response. 
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3 . 2  Transhybrid  Response  under  Non-linear  Assumption 

One  of  the  major  difficulties  i:-  dealing  with  nonlinear 
system  identification  is  the  lack  of  a  unified  mathematical 
theory  for  representing  various  nonlinear  characteristics. 
There  are,  however,  a  number  of  representations  for 
nonlinear  systems  identification  purposes.  The  quality  of 
those  representations  depends  on  the  kind  of  the 
nonlinearity  in  the  system.  A  well  known  representation  is 
the  Volterra  series,  with  which  the  least  squares  technique 
can  be  easily  applied  to  nonlinear  systems. 

The  time  domain  nonlinear  systems  algorithm  that  we 
will  describe  here  is  due  to  Roy  and  Sherman  [12].  Its  main 
drawback  is  the  amount  of  computations  needed.  To  overcome 
this  complexity  a  new  frequency  domain  nonlinear  algorithm 
has  been  developed  which  reduces  the  computation  by  an  order 
of  N,  where  N  is  the  filter  length. 

Since  the  time  domain  nonlinear  algorithm  is  much  more 
easily  explained,  and  to  catch  the  idea  of  the  method,  we 
will  present  here  only  the  time  domain  algorithm.  The 
details  of  the  frequency  domain  algorithm  are  given  in  NSC 
NOTE  147  [13].  Both  algorithms  converge  to  the  same  result. 
The  only  advantage  to  the  frequency  domain  algorithm  is  the 
reduction  in  computation  and  its  fast  convergence  compared 
to  the  time  domain  algorithm. 
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3.2.1  Time  Domain  Nonlinear  Adaptive  Filter 


The  nonlinear  system  identification  algorithm  described 
in  [12]  can  be  readily  interpreted  in  an  adaptive  filtering 
notation.  The  input-output  relationship  of  a  nonlinear 
system  can  be  expressed  explicitly  as  a  Volterra  series  [14, 


15,  16] 


(t )x(t-X )dx 


+  f  f  h2(t1,x2)x(t-T1)x(t-x2)dT1dx2 

JqJq 


+  . 


1'  * 


n 


x  )  n  x(t-x • )dx • +. . . 
n  i=l  11 


(3.1) 


The  n-th  order  Volterra  kernel  hn( X1 ' x2 '  *  * ' '  Tn^  represents 
the  weighting  function  of  the  n-th  degree.  Thus  the  n-th 
order  term  is  an  n-fold  convolution  integral.  If  we  assume 
that  the  system  is  stable  and  has  a  finite  memory,  the 
system  can  be  approximated  by  its  sampled  data  form;  the 

output  can  be  written  as 

N-l  N-l  N-l 

y(n)  =  I  h.  (i)x(n-i)  +  I  I  h,(i, j )x(n-i)x(n-j )  + 
i=Q  x  i=Q  j=0  x 


N-l  N-l  N-l 

+  2  I  I  h_(i, j,k)x(n-i)x(h-j )x(n-k)  +  ...  (3.2) 

i=Q  j=0  k=0 


The  objective  of  the  nonlinear  adaptive  filter  is  to  find 
the  system  models  h^(i),  h2(i,j ),...,  which  minimize  the 
output  mean  square  error.  The  system  can  be  found  by 
standard  least  squares  technique  as  done  in  the  linear  LMS 
algorithm.  For  practical  reasons  only  the  quadratic  form 
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will  be  developed  here  but  the  same  procedure  can  be 
extended  to  higher  orders.  For  the  second  order  case  the 
output  can  be  written  in  the  following  form: 

y(n)  =  uT(n)b  +  v(n)  (3.3) 

where  v(h)  is  random  additive  noise, 

yT(n)=[x(n),x(n-l), . . .  /x(n-N-J1'x2(n)/x(n)x(n-l), . . .  ,x2(n-**-HJ  ,(3.4) 
and 


b  =[h1(0),h1(l),  ...,h(N-l)|h2(0,0),h2(0,l), 


,h2(N-l,N-l)] 


(3.5) 


The  filter  weights  adaptation  equations  for  the  kth 
iteration  are 


h[k)(i)  =  h[k’1)(i)  +  M  e(n)x(n-i )  (3.6) 

h2k) (i, j )  =  h^k“1)(i,j)  +  m  e(n)x(n-i)x(n-j )  (3.7) 


where  p  is  the  convergence  constant.  The  block  diagram  of 
the  time  domain  nonlinear  algorithm  is  presented  in  Fig.  10. 

From  equations  (3.3)  to  (3.7)  we  see  that  the  number  of 
multiply-adds  for  each  output  point  or  iteration  is  of  an 
order  of  N2  or  0(N2).  For  N  output  points  the  number  of 
multiply-adds  is  thus  of  0(N3 ) .  Because  of  this  high 
computational  complexity  this  time  domain  nonlinear  filter 
is  not  widely  used.  A  new  frequency  domain  nonlinear 
algorithm  [13]  achieves  the  same  performance  with  an  order 
of  N2  for  N  output  points  or  0(N2 ) . 
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3.2.2-  Measurement  Results  of  a  Nonlinear  Model 

For  the  nonlinear  transhybrid  response  measurement  ve 
use  the  same  data  collected  as  described  in  section  3.1. 
Instead  of  using  the  linear  adaptive  filter  to  identify  the 
system,  we  use  a  new  frequency  domain  nonlinear  adaptive 
filter  as  described  in  [13].  This  algorithm  was  chosen 
because  it  has  lower  computational  complexity  and  faster 
convergence  than  the  time  domain  nonlinear  adaptive  filter. 

The  results  of  the  nonlinear  system  identification 
algorithm  are  presented  in  figures  11,  12.  Fig.  11  presents 
the  convergence  behavior  of  the  linear  and  nonlinear 
frequency  domain  algorithms.  The  convergence  behavior  is 
given  in  terms  of  signal-to-error  ratio  in  dB  versus  number 
of  iterations.  From  the  plots  in  Fig.  11  we  see  that  the 
linear  algorithm  achieves  an  average  signal-to-error  of 
28  dB.  In  those  results  we  used  a  filter  of  order  70 — 
higher  orders  did  not  achieve  better  results.  Fig.  12 
presents  the  actual  two-dimensional  response  as  identified 
by  the  algorithm. 

Those  results  were  obtained  from  one  hybrid,  and  we 
don't  know  the  average  statistical  behavior  on  different 
hybrids.  Since  in  the  nonlinear  algorithm  we  model  only  the 
second  order  Volterra  kernel,  it  is  clear  from  our  results 
that  the  main  nonlinearity  in  the  hybrid  is  a  second-order 
nonlinearity. 


FRAME  •  1  NUMBER  OF  FRAMES 


Second  order  time  domain  Volterra  kernel  for 
a  physical  hybrid.  Top  View  (32  x  32  points) 


STARTING  FRAME  -  1  NUMBER  OF  FRAMES 


Second  order  time  domain  Volterra  kernel  for 
a  physical  hybrid. 

Bottom  View  (32  x  32  points) 
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The  main  conclusion  from  this  experiment  is  that  the 
performances  of  linear  echo-canceller  are  limited  by  the 
nonlinearity  of  the  hybrid.  The  main  nonlinearity  can  be 
modeled  by  a  second-order  Volterra  kernel.  If  for  certain 
applications  higher  echo  reduction  than  the  one  achieved  by 
linear  algorithm  is  needed,  one  can  use  the  nonlinear 
frequency  domain  algorithm.  The  complexity  of  this 
algorithm  is  of  0(N),  like  the  Widrow  algorithm,  where  N  is 
the  assumed  order  of  the  filter.  We  have  to  note  that  with 
this  new  algorithm  we  solved  the  computation  complexity  but 
not  the  problem  of  the  large  amount  of  memory  needed.  The 
amount  of  memory  needed  is  0(N2)  compared  to  O(N)  in  the 
linear  adaptive  filter. 
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4 .  Echo  Cancelling  Algorithms 

In  the  study  of  echo  cancelling  algorithms  for  our 
application,  the  main  goal  was  to  have  an  algorithm  which 
converges  fast  enough  and  still  competes  with  the 
implementation  simplicity  of  the  LMS  algorithm.  In  our 
application,  a  faster  convergence  is  needed  than  in  the 
standard  telephone  line  echo  cancelling  problem.  From 
experiments  presented  in  Chapter  6,  we  observe  that  the 
effect  of  the  echo  is  much  more  destructive  with  the 
presence  of  an  LPC  analyzer  in  the  loop. 

In  our  search  for  a  fast  and  yet  simple  algorithm  we 
checked  a  number  of  new  recursive  adaptive  algorithms, 
proposed  recently  in  the  literature.  We  checked  the  simple 
HARF  (SHARF)  [9]  algorithm  and  the  modified  hyperstable 
adaptive  recursive  filter  MHARF  [17] .  Those  algorithms 
perform  very  poorly  as  echo  canceller  algorithms;  for  that 
reason  they  are  not  included  in  the  final  test-bed 
simulation  program. 

The  next  step  was  to  check  nonrecursive  adaptive 
filters  with  similar  complexity  as  the  LMS  algorithms.  The 
promising  candidates  were  the  gradient  lattice  algorithm  [7] 
and  a  new  frequency  domain  algorithm  [8] .  The  gradient 
lattice  algorithm  was  attractive  because  of  its  faster 
convergence,  compared  to  the  LMS  algorithm,  reported  in  the 
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literature.  The  unconstrained  frequency  domain  algorithm  is 
attractive  because  of  the  "pseudo-orthogonality"  of  the 
input  signal  achieved  through  the  use  of  the  D.F.T. 

The  gradient  lattice  adaptive  filter  algorithm  is  well 
documented  in  the  literature.  Since  there  are  a  number  of 
gradient  lattice  algorithms  with  minor  differences  in  the 
literature,  we  will  describe  only  the  gradient  lattice 
algorithm  implemented  in  the  test-bed  simulation.  The  new 
frequency  algorithm  inserted  in  the  test-bed  simulation  will 
be  presented  in  more  detail  with  its  convergence  rate 
compared  to  the  LMS  algorithm  by  a  simple  system 
identification  simulation.  After  the  presentation  of  the 
adaptive  algirithms  inserted  in  the  test-bed  simulation  we 
will  present  the  double  talker  detection  algorithm 
associated  with  each  algorithm. 

4.1  The  Gradient  Lattice  Algorithm 

The  FIR  lattice  gradient  algorithm  is  derived,  as  the 
LMS  algorithm,  via  mean  square  error  minimization  criterion. 
The  main  difference  is  that  in  the  gradient  lattice 
algorithm  the  adaptation  is  done  in  two  stages.  The  first 
stage  consists  of  an  adaptive  Graham-Schmidt 
orthogonalization  of  the  input  signal.  The  second  stage  is 
a  standard  LMS  algorithm  under  the  assumption  that  the  input 
signal  is  already  orthogonal.  It  has  been  shown  that  the 
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gradient  lattice  achieves  faster  convergence  than  the  LMS 
algorithm  for  highly  correllated  stationary  input  signal 
[7]  . 

The  block  diagram  of  the  algorithm  is  given  in  Fig.  13. 
The  equations  controlling  the  different  parameters  of  the 
algorithm  are  given  below. 

x(n)  -  input  signal  to  the  adaptive  filter 
d(n)  -  desired  response  of  the  adaptive  filter 
bQ  (n)  *  f  ^  (n)  =  x  (n)  . 

The  adaptive  equations  for  the  ith  stage  are: 

K  (n+1)  =  Ki(n)  + 

JjrW'W-'bi'n-ll  *  bi+1(„)fi<n)>  ,4.1) 

and 

y  9 

g±  (n+1)  -  gi(n)  +  -  ^  (n)^  (n) )  ;  i=0,l,2...N  (4.2) 

where 

c±  (n)  =  0oi(n-l)  +  (1-0) [b?(n-l)  +  f \ (n) ]  (4.3) 

and 

Pi(n)  =  0Pi(n-l)  +  (1-0)  *  b?  (n)  (4.4) 

are  the  input  power  estimates  at  the  i-th  stage.  The 
reflection  coefficient  k(n)  defines  the  predictor  at 
iteration  n,  and  g(n)  defines  the  filter  gain  at  iteration 
n.  The  variables  f (n) ,  b(n)  and  e(n)  are  the  forward 
prediction  errors ,  backward  prediction  errors ,  and  the 
filter  error  for  the  i-th  stage  of  the  filter  at  iteration 
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n.  and  y2  are  the  convergence  constants,  and  B  is  the 
energy  smoothing  constant. 

This  algorithm  has  been  inserted  in  the  test-bed 
simulation  program.  The  user  can  specify  the  order  of  the 
adaptive  filter  N,  the  convergence  constants  y  ,  y2  and  the 
energy  smoothing  constant  8* 

4.2  The  Qnconstrained  Frequency  Domain  LMS  Adaptive  Filter 

4.2.1.  Introduction 

A  great  effort  has  been  made  to  develop  efficient 
spectral  techniques  based  on  the  FFT.  Many  techniques  have 
been  proposed  [18,19,20,21]  and  although  they  achieve  good 
performance  for  very  restricted  applications,  they  converge 
to  a  biased  suboptimal  solution.  For  instance,  two  earlier 
frequency  domain  approaches  by  Dentino  et  al.  [19]  achieve 
good  performance  as  a  line-enhancer,  and  the  algorithm  by 
Watzner  and  Schwartz  [21]  is  specially  designed  for  the 
isolated  training  pulse  situation  for  which  there  is  very 
limited  use.  Although  the  use  of  frequency  domain  is  very 
attractive  (because  of  the  FFT) ,  the  key  difficulty  that  has 
prevented  prior  work  o.  adaptive  filtering  from  being 
effective  is  that  the  filter  must  perform  linear 
convolutions  whereas  the  FFT  is  intrinsically  suited  for 
circular  convolution. 
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Recently  a  more  general  frequency  domain  adaptive 
filter  algorithm  [FLMS] ,  which  converges  to  the  optimal 
(Wiener)  solution,  was  proposed  by  Ferrara  [22] .  The  FLMS 
is  efficient  for  a  large  number  of  taps  and  can  be  used  in 
general  adaptive  filtering  applications.  However,  the  major 
drawback  of  the  FLMS  algorithm  is  its  slow  convergence  for 
highly  correlated  input  signals,  as  in  the  case  of  the  time 
domain  LMS  algorithm.  The  FLMS  algorithm  requires  five 
DFT"s  in  every  iteration,  two  of  them  are  needed  to  impose 
time  domain  constraint  in  which  the  last  N  points  of  the 
time  domain  impulse  response  are  made  to  be  zero. 

In  this  section  we  introduce  a  new  frequency  domain 
adaptive  filtering  algorithm  that  converges  to  the  optimal 
Wiener  solution  without  the  need  for  any  constraints.  We 
domonstrate  that  the  proposed  algorithm  achieves  both  faster 
convergence  and  reduced  complexity  compared  to  the  FLMS 
algorithm.  This  unconstrained  frequency  domain  algorithm 
(UFLMS)  with  normalized  convergence  constant,  achieves 
faster  convergence  for  an  input  signal  whose  covariance 
matrix  has  highly  disparate  eigenvalues. 

The  UFLMS  is  presented  in  section  4.2.2.  In  4.2.3. 
the  algorithm  with  an  adaptive  convergence  constant  is 
introduced  and  the  performance  of  the  UFLMS  is  compared  to 
the  LMS  algorithms  in  a  system  identification  simulation. 
In  4.2.4.  the  complexity  of  the  proposed  algorithm  is 
described  in  terms  of  number  of  multiply-adds  and  storage 
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required.  The  proof  of  convergence  of  the  UFLMS  to  the 
Wiener  solution  is  given  in  [8]  and  will  not  be  given  in 
this  report. 
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4.2.2.  The  Frequency  Domai n  Adaptive  Filter 


To  simplify  the  presentation  of  the  frequency  domain  algor¬ 
ithm,  a  brief  review  of  the  LMS  algorithm  (Fig.  14)  will  be  given. 

The  LMS  algorithm  is  a  time  domain  adaptive  filter.  For  each 
iteration,  the  weights  of  the  transversal  filter  gj.  are  adapted 

J 

according  to  the  equation 

-j+i  *  *  2»  ej  *j  <4-  5> 

where  is  the  state  vector  of  input  samples  stored  in  the  adaptive 
0 

filter, 

*j  =  <VXM . Xj-N*l)  <4'6) 

The  error  at  the  jth  iteration  is  e-  and  is  defined  by 

J 

ej  =  dj  •  (4-7> 


where 


yj  =  4 


(4.8) 


and  d.  is  the  desired  response  of  the  adaptive  filter. 

J 

In  the  time  domain  adaptive  filter,  e  .  and  y .  are  scalars.  In 

J  J 

the  frequency  domain  adaptive  filter  the  output  of  the  filter  and 
the  error  are  vectors.  New  definitions  of  the  input/output  are 
needed  in  the  frequency  domain  adaptive  filter.  Capital  letters 


will  denote  the  frequency  domain  variables,  lower  case  letters,  the 
time  domain  variables  and  underlined  letters  denote  vectors  or  mat¬ 
rices. 

The  UFLMS  algorithm  is  based  on  the  well  known  "overlap-save" 
method  used  in  fast  convolution  [20].  To  clarify  the  input/output 
formulations  and  notations  used  in  the  derivation  of  the  UFLMS 
algorithm,  we  first  introduce  the  "overlap- save"  method  using 
matrix  notation. 

We  assume  that  the  order  of  the  digital  filter  is  less  than  or 
equal  to  N  +  1.  We  define  a  2N  impulse  response  vector  h  in  the 
following  way: 

h(i)  =  h(i)  i  =  0,1 . N 

h(i )  =  0  i  =  N+l . 2N-1  (4.9) 

The  input  data  stream  x(n)  is  segmented  into  2N  point  vectors  with 
N  points  of  overlap  in  the  following  way: 

XK(n)  =  x(kN  +  n)  n  =  0,1 . 2N-1 

K  =  0,  -  (4.10) 

Using  the  matrix  notation  of  circular  convolution  and  by  dropping  K 
for  simplicity  in  notation,  the  output  vector  yk  will  be 
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y2N-l 

X(0) 

X(l) . 

X(2N-1) 

0 

y2N -2 

X(2N-1) 

x(0)  x(l)-  •  • 

x(2N-2) 

0 

y2N-3 

XC2N-2) 

.... 

0 

hN  (4.11) 

*1  hl 

y0  x(D  x(2)  x(0)  hQ 

We  can  see  from  inspection  that  since  the  first  N-l  points  of 
the  vector  |j  are  zero,  and  the  last  N  point  in  output  vector  y  are 
the  result  of  linear  convolution.  In  the  "overlap-save"  method 
this  matrix  operation  is  done  using  FFT  techniques  which  give  a 
high  computational  efficiency.  We  see  that  in  each  iteraton  k  we 
have  2N  data  inputs  and  N  data  outputs,  for  that  reason  the  input 
data  is  overlapped  by  N  points. 

Now  we  will  introduce  different  matrix  operations  that  we  will 
use  in  the  derivation  of  the  UFLMS  algorithm. 

Notation  and  Definitions 

Let  £  be  a  symmetric  2N  x  2N  matrix  whose  elements  are  Fkj  = 
exp  (-i  (2n/2N)  kj),  k,j  =  0,1,. . . ,2N-1,  where  i  is  the  square  root 
of  -1.  When  £  operates  on  a  column  vector  of  order  2N  the  result 
is  a  column  vector  representing  the  DFT  of  the  original  vector. 

Let  £_1  be  the  inverse  of  matrix  £,  one  can  show  that  F1  is  a 
symmetric  2N  x  2N  matrix  whose  elements  are  Fkj  =  1/2N  exp(i 
(2n/2N)  kj).  It  can  be  shown  that  £  and  £  1  have  the  following 
properties: 
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£^  =  2N  £  1  and  (£  *)^  =  1/2N  £,  where  t  denotes  the  transpose 
complex  conjugate. 

Let  be  the  circulant  matrix  defined  by  equation  (4.10)  and 
(4.11)  for  the  iteration  k. 

Now  define: 


*k  = 


F  X,  F 


-1 


since  £k  is  a  circulant  matrix,  Xk  is  a  diagonal  matrix  whose 
elements  are  the  OFT  output  of  the  first  row  of  the  circulant 
matrix  £k  (for  details  see  [25]).  Using  the  properties  of  the 
matrices  £  and  £  1  we  have  ^  =  £  ^  £  1. 

Let  |  be  a  2N  x  2N  windowing  matrix  whose  lower  N  x  N  right 
corner  is  the  identity  matrix: 


fl  £ 

b  = 

Q  I 


By  inspection,  £  £  =  £.  We  define 


a  £  £  h  g'1  . 


Since  Jj  is  diagonal,  y  is  a  circulant  matrix  whose  first  row 
is  the  inverse  OFT  of  the  vector  (0...0  1  1  ...  1) 

The  following  equalities  hold  for  H: 


I 
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M  M  =  £  h  E'1  £  fi  E'1  =  £  h  £_1  =ti  ,  (4.12) 

af  =  (E-1)1  Ef  =  £  h  E'1  =  d  .  (4.13) 

Let  g  be  a  2N  x  2N  real  diagonal  positive  definite  matrix,  where 
the  elements  on  its  diagonal  represent  the  convergence  constant  for 
each  frequency  in  the  UFLMS  algorithms  to  be  introduced  later. 

Let  dk,  a<k,  be  2N  point  vectors  representing  the  de¬ 

sired  response,  the  impulse  response,  the  error  vector  and  the 
output  vector  in  the  time  domain. 

Finally,  let  Qk,  Mk,  £k,  *k  be  the  2N  points  DFT's  of  the 
vectors  dk,  wk,  £k,  ^k  respectively. 

Using  the  above  definitions  and  the  block  diagram  of  Fig.  15, 
we  can  present  the  unconstrained  frequency  domain  adaptive  filter. 
The  configuration  is  based  on  the  "overlap-save"  technique  [24]. 
The  diagonal  matrix  £k>  derived  from  the  DFT  of  the  2N  point  input 
vector  at  iteration  k,  is  multiplied  by  Wk,  the  DFT  of  the  impulse 
response,  to  give  the  output  |k> 

4*  2k  *k  <«•«) 

The  DFT  output  in  the  time  domain  is 

*k  =  S"1  ^k  (4.15) 
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The  error  in  the  time  domain  before  windowing  is 

«k  =  4'*k  •  (4-16) 

According  to  the  "over lap- save"  method,  when  the  filter  length  is 
N+l  only  the  last  N  points  of  yfc  are  the  result  of  a  linear  convo¬ 
lution;  then  the  vector  error  output  must  be  windowed  by  fc  to  give 

»k  -  !  <4  -  *k>  *  6  <4  •  E'1  h  V  ■  <417> 

Since  we  want  to  adapt  the  coefficients  in  the  frequency  domain,  we 
take  the  DFT  of  equation  (4.17)  by  multiplying  both  sides  by  £, 

£k  =  g  fik  =  £  h  (4  -  E'1  4  Wk) 

£k  =  E  h  g1  (E  gk  -  E  E"1  Kk  yk) 

£k  =  H  (fik  -  ^  Mk)  (4-18) 

To  derive  the  adaptive  algorithm,  we  can  follow  the  same  steps  used 

to  derive  the  LMS  algorithm.  Assuming  that  the  input  signal  is 

stationary  and  the  identified  system  is  time-invariant,  we  can 

2 

show  that  the  expected  value  of  the  squared  error  e  {  £k  }  is 
quadratic  in  yk.  By  using  the  gradient  method,  the  algorithm  will 
converge  to  some  y*  that  achieves  the  minimum  mean  square  error. 
The  squared  error  at  each  iteration  is 
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£k  2  =  £k  =  (Qf  -  If)  |f  H  (D  *  I  |  )  ,  (4.19) 

as  H  is  symmetric  and  idempotent 

Sk  2  -  (#1  -  4  U  <8*  -  Xk  <4-2°> 

Assume  for  now  that  the  weight  vector  yk  is  independent  of  Xk- 
This  assumption  serves  only  to  motivate  the  design  of  the  algorithm 
and  is  not  used  in  the  convergence  proof  which  follows  later.  The 
expected  squared  error  will  be 

e{  Ek  Z)  =  e  U  Qk}  -  2e  M  Xk)  Wk 

+  Mk  e{Xk  H  Xkl  Kk  .  (4-21) 

where  s  denotes  expectea  value. 

Equation  (4.21)  is  quadratic  in  yk-  The  optimal  vector  y*  is 
obtained  by  setting  the  gradient  with  respect  to  yk  equal  to  zero, 

5yk  -  "2e{fik  M  Xk)  +  2  jrfk  e{Xk  H  XkJ  =  0  .  (4.22) 

The  UFLMS  algorithm  uses  the  gradient  method  to  solve  equation 
(4.22).  According  to  this  method  the  "next"  weight  vector  yk+1  is 
equal  to  the  present  weight  vector  yk  plus  an  increment  propor¬ 
tional  to  the  negative  gradient, 

^k+i  =  ‘  y  ^yk 


(4.23) 
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In  the  UFLMS  algorithm  the  gradient  vector  is  estimated  by  its 
instantaneous  value  at  iteration  k.  As  in  the  LMS  algorithm  this 
instantaneous  gradient  is  used  to  approximate  the  real  gradient. 
From  equation  (4.22)  the  equation  controlling  the  filter  weights 
will  be 

Mk+i  =  Wk  +  2  y  ^  (ti  Dk  ’  W  4  •  (4. 24) 

or  by  using  (4.18),  equation  (4.24)  becomes 

Sk.i  ■  Sk  *  2  y  4  e,  .  <«•»> 

where  the  diagonal  matrix  y  is  the  convergence  constant  as  in  the 
LMS  algorithm.  As  we  shall  see,  we  can  choose  different  conver¬ 
gence  constants  for  different  frequencies. 

In  [8]  we  prove  that  the  adaptive  filter  is  stable  if  p(i,i) 

2 

is  bounded  by  the  maximum  of  (X^.  j)  for  all  k.  Here  we  present 
the  algorithm  in  a  simplistic  way;  in  [8]  we  proved  that  the  algo¬ 
rithms  converge  to  the  optimal  solution  under  the  following  condi¬ 
tions. 

1.  The  input  singal  to  the  adaptive  signal  is  ergodic, 
stationary  and  bounded. 

2.  The  covariance  matrix  of  the  input  signal  has  at  least 
rank  N+l. 

3.  The  identified  system  is  time  invariant. 

4.  The  order  of  the  identified  system  is  less  than  or  equal 
to  N+l. 
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4.2.3.  Simulation  results 


From  [8],  we  have  to  choose  0  <  p ^ .  <  1/M^ ,  where  M..  is  the 
upper  bound  for  the  energy  at  frequency  i.  Since  in  practical 
applications,  we  do  not  know  those  bounds,  we  choose  to  estimate 
by  normalizing  a  constant  convergence  factor  a  by  an  estimate  of 
the  energy  at  the  i-th  frequency.  This  normalization  is  similar  to 
the  process  used  in  the  lattice  adaptive  filter  [7]: 

ViW-zlf 71  •  (4’26) 


where 


2(1)  k  =  (1  -  0)  z(i)  k.x  ♦  3Xk(i)  X*(i)  (4.27) 

where  i  is  the  frequency  index,  cr  is  the  normalized  convergence 
factor  for  all  the  frequencies  and  p  is  the  energy  smoothing  con¬ 
stant  for  all  frequencies.  As  in  the  lattice  adaptive  filter 
algorithm,  different  smoothing  and  normalization  algorithms  can  be 
applied  depending  on  the  applications. 

To  illustrate  the  rapid  convergence  rate  of  the  UFLMS  algor¬ 
ithm  for  highly  correlated  input  signal,  two  simulations  of  system 
identification  are  discussed,  one  with  uncorrelated  input  signal 
and  the  other  with  a  highly  correlated  signal. 

A  32  weight  F.I.R.  filter  was  chosen  as  the  system  to  be 
identified,  with  the  actual  values  of  the  impulse  response  given  in 
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NUMERATOR 

DENOMINATOR 

1 

0.230847E+00 

0 . 100000E+01 

2 

-0.102171E+00 

0.000000E+00 

3 

0 . 383698E-01 

0 . 000000E+00 

4 

-0.295674E-01 

0 .0000Q0E+00 

5 

-0 . 575074E-02 

0.000000E+00 

6 

-0 . 693523E-02 

O.OOOOOOE+OG 

7 

-0 . 295320E-01 

0 .  OOOOOOE+OO 

8 

-0.690188E-02 

O.OOOOOOE+OO 

9 

-0.503678E-01 

0. OOOOOOE+OO 

10 

0 . 102334E-01 

0.000000E+00 

11 

-0.448287E-01 

0.000000E+00 

12 

-0 . 354443E-01 

0.000000E+00 

13 

-0.331595E-01 

0.000000E+00 

14 

0.515750E-02 

O.OOOOOOE+OO 

15 

-0.419697E-01 

O.OOOOOOE+OO 

16 

0 . 518027E-02 

O.OOOOOOE+OO 

17 

-0.163817E-01 

O.OOOOOOE+OO 

18 

-0.926803E-02 

O.OOOOOOE+OO 

19 

-0 .896433E-03 

O.OOOOOOE+OO 

20 

-0 . 948155E-02 

O.OOOOOOE+OO 

21 

-0 . 180632E-02 

O.OOOOOOE+OO 

22 

-0.774926E-03 

O.OOOOOOE+OO 

23 

-0.647487E-02 

O.OOOOOOE+OO 

24 

0.329426E-02 

O.OOOOOOE+OO 

25 

-0.438749E-02 

O.OOOOOOE+OO 

26 

0.927637E-03 

O.OOOOOOE+OO 

27 

-0 . 200423E-04 

O.OOOOOOE+OO 

28 

-0 . 152551E-02 

O.OOOOOOE+OO 

29 

0.340074E-02 

O.OOOOOOE+OO 

30 

-0.310239E-02 

O.OOOOOOE+OO 

31 

0 . 551548E-0  2 

O.OOOOOOE+OO 

32 

-0.336223E-02 

O.OOOOOOE+OO 

TABLE  1.  COEFFICIENTS  OF  31-TH  ORDER  ALL 
ZERO  FILTER  (FIR) 
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Table  1.  This  F.I.R.  filter  approximate  a  transhybrid  response 
[26].  In  the  first  experiment,  the  input  signal  x(n)  was  white 
random  noise  uniformly  distributed  between  -1000  and  1000.  The 
desired  response  dk  was  the  result  of  the  convolution  of  x(n)  with 
the  F.I.R.  filter.  Since  d^  in  our  simulation  was  transformed  back 
from  floating  to  integer  representation,  the  additive  noise  in  this 
simulation  was  the  quantization  noise.  Figure  16  presents  the 
block  diagram  of  the  computer  simulation. 

In  Fig.  17,  we  have  the  convergence  of  UFLMS  and  LMS  algo¬ 
rithms,  presented  as  the  S/N  in  dB  between  the  energy  in  the  de¬ 
sired  response  signal  to  the  error  signal.  We  see  that  both  al¬ 
gorithms  have  almost  the  same  convergence  rate  as  expected  from  the 

-5 

discussion  in  [8].  For  the  LMS  algorithm  a  p  of  0.1  x  10  was 
chosen  to  achieve  the  fastest  convergence  rate.  In  the  UFLMS 
algorithm  a  =  0.4  and  p  =  0.8  were  chosen  tt  achieve  the  same 
misadjustment  error  as  in  the  LMS  algorithm. 

In  the  second  experiment,  the  input  signal  was  changed  to  a 
highly  correlated  signal  by  passing  the  same  white  noise  from  the 
first  experiment  through  a  12th  order  all  pole  filter;  the  co¬ 
efficients  of  the  filter  are  given  in  Table  2.  The  motivation  was 
to  produce  a  highly  correlated  signal  and  with  this  filter  a 

X  of  20  was  achieved, 
max  mi n 

In  Fig.  18  the  convergence  rate  with  the  correlated  signal  is 
given  for  the  LMS  and  UFLMS  algorithms.  We  notice  that  the  con¬ 
vergence  of  the  UFLMS  is  almost  the  same  as  with  correlated  signal 
as  with  the  uncorrelated  signal.  The  LMS  algorithm  has  slow 


Fig.  16.  Simulation  block  diagram 
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NUMERATOR 

DENOMINATOR 

1 

0.100000E+01 

0 . 100000E+01 

2 

O.OOOOOOE+OO 

-0 . 146034E+01 

3 

0 . 000000E+00 

0 . 126638E+01 

4 

0.000000E+00 

-0.850541E+00 

5 

0.000000E+00 

0 . 629542E+00 

6 

O.OOOOOOE+OO 

-0 . 497503E+00 

7 

O.OOOOOOE+OO 

0 . 273701E+00 

8 

O.OOOOOOE+OO 

-0.168227E+00 

9 

O.OOOOOOE+OO 

0 . 257914E+00 

10 

O.OOOOOOE+OO 

-0 . 238396E+00 

11 

O.OOOOOOE+OO 

0 . 508109E+00 

12 

O.OOOOOOE+OO 

-0 . 379440E+00 

13 

O.OOOOOOE+OO 

0 . 204267E+00 

TABLE  2.  COEFFICIENTS  OF 
POLES  FILTER 


12-TH  ORDER  ALL 


I 
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coei 


Fig.  18.  Convergence  of  LHS  and  UFUi: 

FOR  CORRELATED  INPUT. 
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cover gence  with  the  highly  correlated  signal.  For  the  LMS  algo¬ 
rithm  a  p  =  0.2  x  10  ^  was  chosen  to  achieve  the  fastest  conver¬ 
gence  possible,  for  the  UFLMS  algorithm  a  a  =  0.09  and  p  =  0.8  were 
chosen  to  achieve  the  same;  misadjustment  error.  These  simulation 
results  illustrate  the  advantage  of  the  UFLMS  algorithm  for  highly 
correlated  signals  over  the  LMS  algorithm. 
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4.2.4.  Complexity  of  the  Algorithm 


For  large  N  (filter  length  -1)  the  algorithm  is  very  effi¬ 
cient.  To  filter  N  points  the  conventional  LMS  algorithm  requires 
N  iterations  or  2N(N+1)  real  multiplies.  For  the  same  N  points, 
the  UFLMS  algorithm  requires  three  FFT's  of  2N  real  data  points, 
two  2N  complex  multiplies  and  two  2N  real  multiplies.  For  real 
input  data,  the  output  of  the  FFT  is  conjugate  symmetric  so  that 
only  the  first  N+l  values  need  be  updated.  Furthermore,  for  real 
data,  the  2N  point  FFT  can  be  realized  with  an  N  point  FFT  and  N 
complex  multiplies  using  an  array  of  N  complex  points.  Each  N 
point  FFT  requires  N/2  log  N/2  complex  multiplies.  Therefore,  the 
number  of  complex  multiplies  per  block  will  be  3(N/2  log  N/2  +  N) 
for  the  three  FFT's,  2N  complex  multiplies  for  the  complex  weight¬ 
ing  and  adaptation,  and  2N  real  multiplies  for  the  convergence 
constants.  For  the  adaptive  convergence  constant  5N  additional 
real  multiplies  are  required. 

Assuming  one  complex  multiply  is  equivalent  to  4  real  mul¬ 
tiplies,  the  ratio  y  between  the  UFLMS  real  multiplies  and  the  LMS 
real  multiplies  will  be 


3  log  ?  +  11 

Y  = - jr -  (428) 

for  constant  convergence  factor  and 

2  3  log  +  13.5 

Y  =  n 


(4.29) 
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for  the  adaptive  convergence  factor.  This  ratio  is  computed  for 
several  values  of  N  in  the  following  table 


N 

*1 

*2 

16 

1.25 

1.4 

32 

0.72 

0.8 

64 

0.4 

0.44 

128 

0.22 

0.24 

256 

0.12 

0.13 

512 

0.068 

0.072 

1024 

0.037 

0.039 

From  the  table  we  see  that  for  N  =  32  the  proposed  algorithm  is 
already  more  efficient  than  the  LMS  algorithm. 

The  UFLMS  algorithm  requires  3  real  arrays  of  2N  points  each 
for  the  three  FFT's  and  one  2N  real  data  array  for  the  filter  co¬ 
efficients.  For  the  adaptive  convergence  factor  one  more  array  of 
N  real  data  points  is  needed.  The  sine  table  for  the  FFT  needs  an 
array  of  N/4  real  data  points.  Overall  we  need  about  8N  points  of 
memory  compared  to  only  2N  in  the  LMS  algorithm. 

We  see  that  what  we  gain  in  computation  we  lose  in  memory. 
For  special  purpose  hardware  the  proposed  algorithm  is  even  more 
efficient  since  we  are  working  on  block  operations  which  can  be 
done  very  efficiently  with  array  processors. 


.  i 

,  I 
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4.3.  The  Double  Talker  Algorithm 

The  function  of  the  double  talker  is  to  detect  the 
presence  of  near  end  talker  voice  input,  so  that  adaptation 
in  the  adaptive  filter  can  be  halted  when  a  near  end  talker 
is  detected  to  avoid  adapting  the  coefficients  in  the  wrong 
direction.  Basically,  the  double  talker  detector  measures 
the  energy  at  both  points  at  the  four-wire  side  of  the 
hybrid,  and  when  the  energy  ratio  between  the  four-wire 
transmit  and  receive  side  exceeds  a  certain  threshold,  the 
algorithm  decides  that  near  end  talker  input  is  present. 

For  the  LMS  and  the  gradient  lattice  algorithm  a  simple 
double  talker  detection  algorithm  taken  from  [6]  is  used. 
It  compares  the  output  signal  d(n)  of  the  hybrid  with  the 
input  signal  x(n)  of  the  hybrid  over  the  L  preceding  sample 
points.  If  the  signal  d(n)  is  greater  than  one-third  the 
largest  absolute  (in  magnitude)  value  of  x(n)  over  L 
preceding  sample  points,  double  talking  is  "detected".  The 
number  L  is  the  number  of  the  impulse  response  of  the 
transhybrid  function,  chosen  by  the  user.  In  the  simulation 
test-bed  the  user  can  specify  the  threshold  for  the  double 
talker  algorithm.  In  our  simulations  we  set  the  threshold 
to  9  dB  under  the  assumption  (which  is  true  in  our 
simulation)  that  the  transhybrid  loss  is  at  least  9  dB. 

Since  the  frequency  domain  algorithm  is  a  block  type 
algorithm,  a  slight  modification  has  been  made  in  the 
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double  talker  algorithm.  In  the  frequency  domain  algorithm 
each  block  of  N  points  is  processed  at  the  same  time  and  the 
input  data  is  overlapped  so  that  in  each  block  we  get  M  new 
points  when  M  <  N/2.  For  each  new  input  point  the  double 
talker  algorithm  works  in  the  same  mode  as  for  the  LMS 
algorithm,  but  the  decision  to  adapt  or  not  is  based  on  M 
decisions  at  the  same  time.  In  the  algorithm  that  we  use, 
if  there  is  more  than  one  double  talker  decision  in  the  M 
new  points,  a  double  talker  flag  is  applied  to  the  entire 
block. 

In  the  double  talker  algorithm  we  have  a  double  thres¬ 
hold  option.  One  option  is  the  threshold  for  the  individual 
or  point  threshold  as  in  the  LMS  algorithm.  The  second 
threshold  is  the  number  of  double  talker  events  in  one  block 
of  points.  In  the  simulation  we  ran,  the  first  threshold 
was  9  dB,  and  the  second  threshold  was  more  than  one  double 
talker  decision  in  the  M  points.  These  thresholds  were 
found  experimentally. 


70 


5 .  Improved  LPC  Analyzer 
For  Large  Dynamic  Range  Signals 

5.1  Introduction 


In  this  section  we  discuss  the  possibility  of  using  an 
Automatic  Gain  Control  (AGC)  in  the  full  duplex  channel  with 
a  vocoder  in  the  loop.  The  main  purpose  of  the  AGC  is  to 
enhance  the  quality  of  the  synthesized  speech  in  low  level 
signals.  A  low  level  signal  can  be  caused  either  by  line 
loss,  which  can  be  as  high  as  15-20  dB  [3]  for  a  long 
distance  line,  or  by  the  speaker  articulation. 

Fig.  3  shows  a  block  diagram  pertaining  to  the 
simulation  of  a  full-duplex  telephone  with  a  vocoder  in  the 
loop.  This  block  diagram  includes  the  A/D  and  D/A  units, 
the  echo  canceller,  the  anlyzer  and  synthesizer,  and  a 
digital  simulation  of  the  hybrid  for  the  2-wire  to  4-wire 
conversion.  The  best  place  to  introduce  the  AGC  is  before 
the  A/D  unit.  Since  the  AGC  must  be  an  analog  device  which 
is  highly  nonlinear,  its  interaction  with  other  components 
must  be  carefully  examined — especially  with  the  hybrid  and 
the  echo  canceller.  In  Subsection  5.2  we  discuss  the 
problems  that  arise  from  such  interaction. 

Since  the  main  conclusion  of  Subsection  5.2  is  that  the 
AGC  performance  becomes  very  complicated  when  interacting 
with  other  components,  we  chose  in  Subsection  5.3  to  work 


A  closed  loop  with  block  diagram  of  full  duplex 
telephone  network  without  echo-canceller 
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directly  on  the  improvement  of  the  LPC  analyzer  for  a  low 
level  signal.  In  Section  5.4  we  describe  an  xperiment  in 
which  good  quality  telephone  line  speech  has  been  achieved 
over  a  dynamic  range  of  30  dB,  using  the  algorithm  described 
in  Section  5.3. 

5.2  AGC  Interfacing  Problems 

To  analyze  the  effect  of  an  AGC  on  the  full-duplex 
system  (Fig.  19}  in  an  attempt  to  achieve  better  synthesis 
quality,  some  characteristic  parameters  of  the  AGC  must 
first  be  defined.  In  the  current  case — an  AGC  followed  by 
an  A/D  converter  and  an  LPC  analyzer — standard  AGC 
parameters  are: 

1)  Dynamic  range  of  30  to  40  dB:  this  parameter  is 
necessary  in  offsetting  the  losses  due  to  telephone  lines 
which  may  be  as  high  as  20  dB  [3] ,  and  the  naturally 
occuring  energy  variations  due  to  articulation. 

2)  Fast  response  time  of  10  to  30  msec:  this  is  the 
time  for  the  AGC  to  change  from  a  high  gain  to  a  low  gain. 
This  fast  response  time  is  required  to  avoid  overload  upon 
precipitous  high  level  input  signals. 

3}  Slow  release  time  of  0.5  to  1  sec:  this  is  the  time 
for  the  AGC  to  change  from  a  low  gain  to  a  high  gain.  This 
slow  response  time  is  necessary  to  track  the  speech 
intonation. 
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The  analysis  of  the  system  with  such  an  AGC  is  very 
complex.  From  Fig.  19  we  see  that  because  of  the  hybrid 
leakage  we  have  a  closed  feedback  loop  which  may  become 
unstable.  The  main  difficulties  in  analyzing  this  closed 
loop  system  come  from  the  nonlinearities  of  the  LPC 
analyzer  and  the  AGC,  and  the  dynamic  behavior  of  the  echo 
canceller.  Even  more  complexity  is  added  from  the 
adaptation  of  the  echo  canceller  being  frozen  during 
simultaneous  double  talker  conversation.  To  simplify  the 
analysis  we  will  check  conditions  for  stability  of  the 
system  without  the  echo  canceller,  and  then  analyze  the 
effect  of  the  AGC  on  the  performance  of  the  echo  canceller. 
It  can  be  shown  that  stability  of  the  system  without  the 
echo  canceller  and  stability  of  the  echo  canceller  with  the 
AGC  assures  the  stability  of  the  system  under  normal 
conditions. 

5.2.1  Stability  of  the  System  Without  Echo  Canceller 

To  check  conditions  for  stability  without  the  echo 
canceller  we  may  use  the  block  diagram  of  Fig.  19.  In  this 
figure  the  transhybrid  responses  of  the  near-end  and  far-end 
talkers  are  represented  respectively  by  H^(z)  and  H2(z). 
The  block  diagram  includes  the  AGC,  the  A/D  and  D/A 
converters,  the  LPC  analyzer/synthesizer,  and  the  satellite 
delays.  Since  the  LPC  analyzer  is  nonlinear  we  cannot  use 


standard  linear  techniques  to  check  stability  of  the  system. 
However,  we  may  use  the  Lp  stability  condition  theorem  [23] 
for  the  purpose  (where  sufficient  conditions  for  stability 
require  that  the  over-all  closed  loop  gain  be  less  than  1) . 
Since  the  system  is  more  complex  when  we  include  the  echo 
canceller  in  the  loop,  this  sufficient  criterion  assures  the 
stability  of  the  system.  To  simplify  the  analysis,  we 
replace  the  transhybrid  response  H^(z)  and  H2(z)  by  a 
constant  gain  and  a2  for  the  hybrids,  respectively,  which 
in  the  worst  case  can  be  as  high  as  -6  dB.  If  we  use  an  AGC 
with  dynamic  range  of  30  dB,  unstability  may  occur  since  the 
over-all  loop  gain  can  be  as  high  as  48  dB  in  the  worst 
case.  If  GAGC  is  the  AGC  gain  in  dB  for  both  sides  and 
and  ct2  are  the  worst  case  leakage  in  dB  for  the  hybrids,  the 
overall  closed  loop  gain  will  be 


GCL  ~  2GAGC  + 


*1  + 


(5.1) 


To  assure  stability  of  the  system  we  must  add  some  loss 
in  the  closed  loop  system  to  compensate  for  the  high  AGC 
gain.  Since  the  main  function  of  the  AGC  is  to  maintain  a 
high  signal  to  quantization  noise  ratio  at  the  input  to  the 
analyzer,  the  loss  must  be  introduced  after  the  analyzer 
along  the  transmitting  path.  In  fact,  the  best  place  to 
insert  this  loss  is  after  the  synthesizer.  The  information 
about  the  amount  of  loss  needed  can  be  obtained  from  the  AGC 
gain,  and  transmitted  digitally  through  the  channel  along 
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with  the  information  from  the  analyer.  Fig.  20  represents  a 
block  diagram  which  inludes  an  analog  controlled  loss  after 
the  synthesizer.  An  analog  instead  of  digital  loss  is 
suggested  to  maintain  low  quantization  noise  for  low  level 
signals  at  the  synthesizer  output.  An  accuracy  of  3  dB  in 
the  controlled  loss  is  enough  to  assure  stability  since  a 
6  dB  minimum  loss  comes  from  each  hybrid,  yielding  a  singing 
margin  of  6  dB  for  the  system  in  the  worst  case  (See  NSC 
Note  139  for  definitions)  [4] .  Therefore,  an  additional  4 
bits  per  frame  are  needed  to  transfer  the  AGC  information  to 
the  controlled  loss  through  the  channel.  Since  the 
information  from  the  LPC  is  transmitted  for  every  frame,  the 
AGC  gain  information  must  also  be  sent  for  every  frame. 
Therefore,  the  change  in  the  AGC  gain  must  occur  on  frame 
boundaries  only. 

A  partial  conclusion  from  the  discussion  so  far  is  that 
an  AGC  can  be  introduced  as  shown  in  Fig.  20  under  the 
following  conditions: 

1.  A  gain  change  in  the  AGC,  can  occur  at  frame 
boundaries  only. 

2.  A  4  bits  quantization  of  the  AGC  gain  must  be 
transmitted  through  the  channel  for  an  analog 
controlled  loss. 

In  the  next  section  we  analyze  the  effect  of  the  AGC  on  the 
echo  canceller. 


.. 
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From  the  last  section  we  saw  that  an  AGC,  with  a  fast 
response  time  of  22  msec  (frame  size) ,  can  be  incorporated 
into  the  system  and  can  have  a  dynamic  range  of  40  dB 
without  causing  stability  problems.  The  question  then  is 
how  such  an  AGC  affects  the  performance  of  the  echo 
canceller.  In  Fig.  21  we  show  a  block  diagram  which 
includes  the  hybrid,  the  AGC,  A/D  and  the  echo  canceller 
with  a  double  talker  detector.  To  achieve  good  performance 
from  the  echo  canceller,  the  adaptive  filter  must  compensate 
for  the  transhybrid  response  plus  the  instantaneous  gain  of 
the  AGC.  Since  we  have  the  information  about  the 
instantaneous  gain  of  the  AGC,  this  information  can  be 
incorporated  into  the  adaptive  filter.  The  same  gain 
information  can  be  used  by  the  double  talker  algorithm  to 
compensate  for  the  gain  introduced  by  the  AGC. 

In  the  ideal  case,  where  the  compensation  in  the  two 
cases  is  exact  and  under  the  assumption  that  the  hybrid  is 
linear,  there  is  no  interaction  problem  and,  as  a  result, 
the  echo  canceller  will  have  the  same  performance  with  or 


without  the  AGC. 

Practically, 

however , 

we  have  to 

be 

concerned  about 

how  an  error 

in  the 

compensation 

and 

non-linearity  in 

the  hybrid  will 

affect  the 

performance 

of 

the  echo  cancelior. 

A  straightforward  calculation  of  the  effect  of  a  gain 


Fig.  21.  Proposed  full  duplex  block  diagram  with  AGC  compensations 
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error  on  echo  cancelling  performance  will  give  us  the 
following  result:  an  error  of  1%  in  gain  will  limit  the  echo 
reduction  to  a  maximum  of  40  dB.  During  the  adaptation 
phase  the  echo  canceller  will  try  to  compensate  for  this 
error  each  time  the  gain  is  changed,  but  during  a  double 
talker  situation  the  adaptation  is  frozen  and  the  maximum 
echo  reduction  in  this  case  is  limited  to  40  dB. 

We  assumed  so  far  that  the  hybrid  is  linear  and  can  be 
modeled  by  a  linear  filter.  From  measurements  of  the 
transhybrid  response  [13] ,  we  found  that  a  linear  filter 
approximates  the  transhybrid  response  only  up  to  25-28  dB; 
this  means  that  with  a  linear  adaptive  filter  the  maximum 
echo  reduction  can  be  25  to  28  dB.  From  those  measurements 
we  found  also  that  a  second-order  Volterra  series  can 
achieve  8  dB  more  echo  reduction  [13] .  In  a  system  without 
AGC  the  maximum  level  of  the  echo  will  be  given  by  the  dB 
sum  of  the  transhybrid  loss  and  the  maximum  attainable  echo 
reduction.  In  the  worst  case  the  transhybrid  loss  is  6  dB, 
and  if  we  add  to  it  the  maximum  echo  reduction  of  28  dB  we 
see  that  the  maximum  level  of  the  echo  becomes  -34  dB.  When 
we  introduce  an  AGC  in  the  loop,  the  gain  of  the  AGC  must  be 
added  also.  If,  for  example,  the  gain  is  20  dB,  the  maximum 
level  of  the  echo  will  be  -14  dB,  which  is  intolerable  for 
long  distance  calls. 

A  conclusion  from  the  above  discussion  is  that  a 
complex  AGC  with  special  specifications  can  solve  the 
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oscillation  problems  and  achieve  a  low  quantization  noise 
for  low  level  signals  at  the  LPC  analyzer  input.  However, 
such  an  AGC  will  reduce  the  echo-reduction  achieved  by  the 
echo  canceller  algorithm  to  an  unacceptable  level.  This 
loss  in  the  echo-reduction  performance  is  due  mainly  to  the 
non-linearity  of  the  hybrid. 

In  the  next  section  we  present  a  different  approach  to 
solving  the  same  problem  by  directly  improving  the  existing 
LPC  analyzer  to  peform  better  for  larger  dynamic  ranges. 

5.3  Pitch  and  Voicing  Algorithm  Improvement 

Pitch  and  voicing  estimation  is  probably  one  of  the 
most  difficult  and  challenging  problems  in  speech  analysis 
for  narrow  band  voice  coding.  It  is  even  more  difficult  to 
design  an  algorithm  that  will  work  well  for  speech  signals 
which  have  a  very  wide  dynamic  range.  Although  we  have 
shown  in  a  previous  note  [11]  that  the  cepstrally  based 
pitch  and  voicing  estimation  method  gives  accurate  pitch  and 
voicing  analysis  results  and  is  particularly  robust  in  a 
noisy  environment,  substantial  degradation  in  the  voicing 
decision  was  recently  found  when  it  was  applied  to  speech 
signals  of  extremely  low  level  with  peak  amplitude  of  the 
order  of  5-6  bits.  In  this  section  we  try  to  improve  the 
algorithm  used  for  dealing  with  extremely  low  level  signals. 

In  this  section  we  describe  the  modifications  made  to 


the  original  algorithm  [11]  and  report  in  the  next  section 
experimental  results  of  applying  the  modified  algorithm  to 
speech  segments  with  abrupt  changes  of  over  20  dB  in  signal 
level.  Significant  improvement  in  performance  will  be  shown 
and  its  effect  in  Wideband  Integrated  Network  Communication 
will  be  discussed. 

5.3.1  Modifications 

As  reported  in  [11] ,  four  parameters  are  used  in  making 
the  voicing  decision — the  first  reflection  coefficient,  the 
energy  parameter,  the  zero  crossing  rate,  and  the  cepstral 
peak  value.  These  parameters  and  the  voicing  decision  logic 
that  uses  these  parameters  are  the  focus  of  the 
modifications  which  are  listed  below: 

A)  The  first  reflection  coefficient  (Kl) 

The  first  reflection  coefficient  appears  to  be  the 
parameter  least  affected  by  inadequate  signal  level. 
However,  it  is  desirable  to  use  in  voicing  decision  a 
Kl  that  is  obtained  independently  of  analysis 
conditions  such  as  the  pre-emphasis  factor.  Since  Kl 
provides  inforration  on  the  first  order  spectral  shape, 
we  prefer  to  use  a  Kl  obtained  from  a  signal  that  is 
constantly  pre-emphasized  by  a  factor  of  0.9.  Such  a 
factor  will  lead  to  a  rough  voicing  indication,  in 
general,  a  positive  Kl  for  unvoiced  frame  and  a 
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negative  K1  for  voiced  frame.  Unfortunately,  such  a 
pre-emphasis  factor  may  require  extra  signal  processing 
since  the  pre-emphasis  factor  in  a  general  vocoder 
analyzer  may  be  differently  specified.  In  order  to 
avoid  introducing  extra  complexity  in  computing  the  0.9 
pre-emphasized  K1  we  use  the  following  approximations. 

Let  x(n)  and  y(n)  be  the  input  sequence  and  the 
pre-emphasized  sequence  with  a  factor  of  p, 
respectively.  That  is, 

y(n)  =  x ( n ) -p  x  (n-1)  (5.2) 

Expressing  the  first  two  autocorrelation  terms  of  y(n) 
in  terms  of  the  autocorrelation  of  x(n),  we  have 

ry(0)  «  E  y2  (n) 

*  E  x2 (n)  -  2p  Z  x (n)  x(n-l)  +  p2  E  x2 (n-1) 
a  rx(0)  -  2prx(l)  +  p2rx(0)  = 

=  (1+p2)  rx(o)  -  2prx(l)  (5.3) 

and  ry(l)  -  Z  y(n)y(n+l) 

=  (l+p2)rx(l)  -  p[rx(2)  +rx(0)]  (5.4) 


As  a  result,  the  first  reflection  coefficient  K1 
corresponding  to  sequence  y(n)  is 

rv(D  p[r  (2)  +r  (0 )  ]  -  (l+p2)r(l) 

K.  -  -  -  *  - 2 - 2 - 1 - x - (5.5) 

ry(o)  (l+p2)rx(o)  -  2prx(l) 
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In  this  case  where  P  =  0.9. 


0.9  [rx(2)  +  rx(0)]  -  1.81  rx(l) 
1.81  rx(0)  -  1.8  rx(l) 


This  equation 

thus  eliminates 

the 

need 

for  an  extra 

pre-emphasis 

operation. 

and 

the 

Kl  will  be 

approximately 

the  first 

reflection 

coefficient 

corresponding  to  a  constantly  pre-emphasized  (with  P  = 
0.9)  data  sequence  for  the  pitch  and  voicing  algorithm. 

Another  modification  to  Kl  is  in  the  normalization 
of  the  parameter  to  a  corresponding  one  with  8  KHz 

sampling  frequency.  In  [11] ,  the  normalization  factor 
is  defined  as 

It  *  (8000/ f s)  for  all  fs  (5.7) 

where  f  s is  the  sampling  frequency  (in  Hz)  of  the  data 
to  be  analyzed.  Such  a  normalization  factor  is 

generally  appropriate  for  f  >8  KHz  due  to  the  shape  of 

S 

the  autocorrelation  function  as  illustrated  in  Fig.  22. 
However,  it  is  too  far  off  for  f  8  KHz.,  resulting  in 
more  voiced  to  unvoiced  errors  due  to  over-reduced  Kl 
score.  We  have  found  from  experiment  that  a 

normalization  factor  of 

IT  *  <8000/fs)1/3  (5.8) 

is  more  suitable  for  f  >8  KHz.  Such  modification 

s 

significantly  improvec  the  voicing  decision  for  fg>  8 


Fig.  22.  Normalization  of  r(l) 
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KHz. 

B)  The  Energy  Parameter 

The  energy  parameter  should  be  carefully  defined 
particulary  when  the  input  level  is  low.  More 
experiments  have  found  that  the  signal  energy  is  more 
appropriate  in  voicing  decision  than  the  residual 
energy.  Therefore,  instead  of  the  residual  energy 
which  was  used  in  [32] ,  we  now  employ  the  signal 
energy.  Note  that  the  energy  parameter  is  a 
complicated  function  of  the  energy  term  as  discussed  in 
[11] .  The  simple  change  from  the  residual  energy  to 
the  signal  energy  has  more  complication  than  is  implied 
in  the  terms  (refer  to  [11]  for  details) . 

As  the  signal  energy  is  always  higher  than  the 
residual  energy,  the  two  terms  RMSUV  and  RMSAVE,  which 
represent  the  smoothed  unvoiced  energy  and  the  overall 
energy  contours,  require  a  higher  initial  value.  We 
have  found  that  a  value  of  512  (increased  from  the 
previous  256)  gives  good  results. 

C)  The  Zero  Crossing  Rate 

In  [11] ,  an  integer  bias  term  was  used  in  the  zero 
crossing  count.  The  bias  term  alternates  its  sign  for 
successive  samples  to  avoid  a  very  small  zero  crossing 
rate  when  the  signal  is  almost  a  constant.  The  term 
was  defined  as 

b  -  INT (RMSUV/128 )  +  4 
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where  INT  and  RMSUV  denote  the  truncation  to  integer 
operation  and  the  smoothed  unvoiced  energy 
respectively. 

Apparently,  the  integer  operation  and  the  constant 
term  ”4"  will  cause  problems  in  obtaining  a  reliable 
zero  crossing  count  when  the  signal  level  is  low.  In 
particular,  the  constant  term  will  lead  to  an  excessive 
zero  crossing  rate  for  low  level  signals  resulting  in 
voiced  to  unvoiced  errors.  Deleting  this  constant  term 
in  turn  may  yield  a  zero  bias,  due  to  the  integer 
operation,  that  will  result  in  a  very  small  zero 
crossing  rate  and  an  unvoiced  to  voiced  error  when  the 
signal  is  low. 

The  following  modif ication  is  thus  made: 

b  -  RMSUV/128  (5.9) 
By  deleting  the  constant  term  and  resorting  to  floating 
point  operation,  a  much  more  reliable  zero  crossing 
count  is  obtained  in  dealing  with  low  level  signals. 

D)  Voicing  Decision  Logic 

Only  minor  changes  in  the  voicing  decision  logic 
were  made.  The  modification  involves  the  non-linear 
score  for  an  extremely  high  or  extremely  low  cepstral 
peak,  and  is  listed  as  follows: 

i)  ICX  *  250  (was  180)  when  ICPT  60  and 

ii)  ICX  ■  -200  (was  -400)  when  ICPT  10  (was  19) 
where  ICPT  and  ICX  denote  the  normalized  cepstral  peak 


87 


value  and  the 

voicing 

score 

due 

to 

cepstral  peak 

respectively. 

It  can  be 

seen 

that 

the 

modification 

emphasizes  the 

voicing 

for 

high 

cepstral  peaks  and 

softens  the  unvoicing  override  when  the  cepstral  peak 
is  low,  a  case  that  may  occur  for  a  voiced  frame  when 
the  signal  is  low. 

As  will  be  demonstrated  in  the  next  section  these 
modifications  greatly  improve  the  voicing  accuracy. 
Also,  the  pitch  accuracy  at  the  same  time  is  well 
maintained  as  compared  to  the  same  signal  of  higher 
level. 

5.4  Experimental  Results 

To  check  the  performances  of  the  existing  LPC  analyz  r 
on  large  dynamic  range  signals  we  set  up  the  following 
experiment.  A  speech  sample  60  sec  long  was  digitized  by  a 
12-bit  linear  A/D  and  then  analyzed  and  synthesized  for 
reference.  Then  the  original  speech  data  was  divided  in 
three  equal  parts  of  20  sec  each.  The  first  section 
remained  the  same,  the  second  section  was  divided  by  8  and 
the  third  section  by  32.  In  this  way  we  formed  test  speech 
with  more  than  30  dB  dynamic  range.  This  test  speech  has 
been  analyzed  and  synthesized  by  both  the  existing  LPC 
algorithm  and  the  improved  algorithm  described  in  the  last 
section.  The  synthesized  speech  from  the  existing  LPC 
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analyzer  was  highly  distorted  mainly  because  of  errors  in 
voice/unvoice  decisions.  The  improved  pitch  algorithm 
achieves  a  fairly  good  synthesized  speech  without  errors  in 
voice/unvoice  decisions  and  the  quality  of  the  synthesized 
speech  was  comparable  to  the  original  synthesis. 

To  illustrate  the  improvement  in  the  voice/unvoice 
detection  algorithm,  we  created  a  new  test  speech  segment. 
This  new  test  segment  includes  60  speech  frames.  The  first 
20  frames  were  chosen  from  the  original  speech  data,  the 
next  20  frames  are  the  same  first  20  frames  but  divided  by 
8,  and  the  last  20  frames  are  the  same  first  20  frames 
divided  by  32.  In  the  upper  part  of  Fig.  23  we  have  the 
display  of  this  new  test  segment,  and  in  the  lower  part  we 
show  the  synchronized  pitch  and  voicing  output  using  the 
existing  pitch  algorithm.  A  number  of  voice/unvoice 
detection  errors  at  low  level  speech  signal  can  be  observed. 
The  corresponding  pitch  and  voicing  output  of  the  test 
segment  using  the  improved  algorithm  is  shown  in  the  lowe- 
part  of  Fig.  24.  We  can  see  that  with  the  improved 
algorithm  no  such  errors  occur. 

To  illustrate  the  effect  of  quantization  noise  on  the 
reflection  coefficients  we  present  Fig.  25.  Fig.  25a 
presents  the  smoothed  LPC  spectrum  of  10  consecutive  frames 
of  the  original  speech,  and  Fig.  25b  presents  the  smoothed 
LPC  spectrum  for  the  same  frames  with  18  dB  lower  signal  to 
quantization  noise  ratio,  and  Fig.  25c  presents  the  smooth 
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power  spectrum  for  the  same  frames  with  30  dB  lower  signal 
to  quantization  noise  ratio.  We  can  notice  from  the  results 
that  the  distortion  introduced  by  the  quantization  noise  is 
fairly  low.  This  confirms  our  results  from  informal 
listening  that  the  synthesized  speech  with  the  improved 
pitch  algorithm  is  fairly  good  for  a  large  dynamic  range  in 
speech  input. 

5.5  Conclusion 


Our  main  conclusion  from  the  study  reported  in  this 
section  is  that  an  improved  analyzer,  as  introduced  in  this 
note,  solves  the  dynamic  range  problem  at  the  LPC  input 
analyzer.  This  solution  is  much  more  efficient  than  an  AGC 
from  both  complexity  and  performance  points  of  view. 

Although  an  AGC  looks  intuitively  attractive,  a  careful 
check  of  its  interaction  with  the  entire  system  leads  to  the 
conclusion  that  even  a  highly  complex  AGC  still  reduces  the 
performance  of  the  echo  cancelling  algorithm. 
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6.  Experimental  Results 


6.1  Introduction 


The  main  purpose  in  developing  the  test-bed  simulation 
was  to  check  the  efficiency  of  the  algorithms  described  so 
far  in  the  context  of  the  entire  full-duplex  network. 
During  the  entire  period  of  the  project  a  lot  of  experiments 
were  run.  In  fact,  every  algorithm  inserted  in  the  test-bed 
simulation  was  first  tested  by  a  special  test  program,  and 
the  best  parameters  for  the  specific  algorithm  were  chosen. 
In  this  section  we  report  those  final  experiments  that  were 
done  to  check  the  interaction  between  a  specific  algorithm 
and  the  full-duplex  system,  under  real  time  conditions. 
Since  the  decisions  on  algorithm  parameters  and  their 
efficiency  are  based  on  the  subjective  speech  quality  output 
of  the  system,  the  description  of  the  results  will  be  more 
qualitative. 

The  organization  of  this  section  is  as  follows.  >rst 
we  describe  the  options  available  in  the  test-bed 
simulation.  Different  network  configurations  can  be 
designed  with  these  options.  Then  we  present  a  number  of 
experiments  done  to  check  the  following  main  points: 
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1.  Echo  effect,  on  speech  perception,  as  a  function  of 
the  delay  length. 

2.  Effect  of  LPC  vocoder  in  a  full-duplex  network 
without  echo  canceller. 

3.  Performances  of  echo  canceller  algorithms  without 
vocoder  in  the  loop. 

4.  Study  of  echo  canceller  algorithms  with  vocoder  in 
the  loop. 

5.  Performances  of  echo  canceller  algorithms  with 
time-variant  transhybrid  response. 

6.2  Test-bed  Simulation  Options 

The  test-bed  simulation  program  is  very  flexible  and  by 
means  of  "software  switches"  allows  the  user  to  choose 
different  configurations  and  different  parameters  for 
various  experiments. 

The  main  options  are: 

with  or  without  vocoder  in  the  loop 
with  or  without  double-talker  algorithm 
with  or  without  echo  cancelling  algorithm 

By  means  of  "software  switches",  we  can  choose  one  of  four 
different  adaptive  filter  algorithms,  we  can  choose 
different  hybrid  responses  or  the  kind  of  filter  (HR  or  FIR 
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filter  as  the  transhybrid  response) .  We  can  also  make 
experiments  with  one  or  both  talkers  active  at  the  same 
time. 

Each  algorithm  can  be  controlled  by  a  different 
parameter.  Part  of  the  parameters  are  controlled  through 
the  common  array  from  the  ILS  system,  other  parameters  are 
introduced  directly  through  the  simulation  program.  By  use 
of  these  parameters  the  user  can  change  the  conditions  or 
thresholds  of  different  algorithms. 

6.3  Experimental  Results 

6.3.1  Experiment  Is  Echo  effect  on  speech  perception 

as  a  function  of  the  delay  length. 

In  this  experiment  we  choose  the  "software  switches"  in 
the  test  bed  simulation  so  that  we  got  a  very  simple  model 
which  included  only  the  programmable  delay  line  and  a  simple 
transhybrid  loss.  The  transhybrid  responses  chosen  in  this 
experiment  were  simple  gain  factors  6^  and  $2  ,  as  in 
Figure  26. 

We  ran  the  digitized  sentences  from  two  speakers, 
collected  in  the  same  way  as  described  in  section  2.3.1.  We 
ran  the  same  data  with  four  different  parameters:  two 
different  parameters  for  the  transhybrid  response, 
{ 6i  ■  B2  "  “5DB  and  -10  DB) ,  and  two  different  lengths  for 
the  delay  *  L2  *  100  msec  and  400  msec). 
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Simplified  full-duplex  telephone  line  model 


fig.  2  7  Output;  (a)  without  acho  ean- 

taliar.  'b>  with  eranahybrid  raapenaa  aqual  to 
taro  (Idaall,  (c)  with  fraquancy  domain  LM  al¬ 
gorithm,  (d)  with  widrow  LM  algorithm,  and 
(a)  with  gradiant  n>  lattice  algorithm.  Idan- 
tical  aeala  for  b-a,  but  aeala  of  a  is  thraa 
eiiaas  iargar  than  tha  aeala  of  b-a. 


99 


By  listening  to  the  outputs,  with  all  4  different 
combinations  for  and  ,  it  was  perceptually  apparent 
that  the  effect  of  the  echo  was  increased  as  increased 
from  100  msec  to  400  msec;  of  course  with  higher  loss  for 
the  hybrid  the  effect  of  the  echo  decreased.  However,  with 
the  higher  loss  and  longer  delay,  the  perceptual  effect  of 
the  echo  still  increased.  This  result  confirmed  results 
reported  in  the  literature  [ 3  ]  on  real  satellite 
communication  telephone  line.  An  interesting  point  to 
notice  is  that  even  without  the  psychological  effect  of  the 
long  delay  on  the  talker's  conversation,  longer  delay 
increased  the  effect  of  the  echo. 

6.3.2  Experiment  2:  Effect  of  LPC  vocoder  in  the  full 

duplex  network  without  echo  canceller. 

In  this  experiment  the  "software  switches"  were  chosen 
so  that  the  full-duplex  system  includes  a  real  simulation  of 
the  transhybrid  response  represented  by  64  points  F.I.R. 
filter,  the  LPC  analyzers  and  synthesizers,  and  a 
transmission  delay  time  of  300  msec. 

The  digitized  sentences  from  two  speakers  collected 
from  real  telephone  lines  were  used  as  input  to  the  test-bed 
simulation  program.  The  sentences  were  specially  chosen 
such  that  a  number  of  double  talker  situations  occur.  The 
average  transhybrid  loss  used  in  this  simulation  was  9  dB. 
We  ran  the  data  with  and  without  an  LPC  vocoder  in  the  loop. 

The  main  result  from  this  test  was  that,  with  the  same 
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transhybrid  response  for  both  cases,  the  effect  of  the  echo 
was  much  more  annoying  with  the  vocoder  than  without  the 
vocoder.  During  double  talkers  situations,  the  distortions 
introduced  by  the  LPC  analyzer  were  so  high  that 
intelligibility  was  lost. 

The  conclusion  from  this  test  was  that  with  a  vocoder 
in  the  loop  a  faster  convergence  rate  is  needed  for  the  echo 
cancelling  algorithm. 

6.3.3  Experiment  3:  Performances  of  echo  cancellers 

without  a  vocoder. 


The  purpose 

of  this 

experiment 

was  to  check 

the 

performances  of 

three  adaptive  filter  algorithms — the 

LMS 

algorithm,  the 

gradient 

lattice 

algorithm  and 

the 

unconstrained 

frequency 

domain 

algorithm — as 

echo 

cancellers.  The 

test-bed 

simulation 

"software  switches" 

were  chosen  so  that  the  full-duplex  system  would  include  the 
transhybrid  response,  a  transmission  delay  line  of  300  msec 
and  one  of  the  echo  canceller  algorithms  mentioned  above. 
In  this  experiment  the  double  talker  algorithm  was  active. 
With  each  adaptive  algorithm  the  test-bed  simulation  program 
selected  the  appropriate  double  talker  algorithm.  The 
double  talker  parameters,  in  this  experiment,  were  selected 
to  get  the  best  performance. 

A  digital  conversation,  between  two  speakers,  was 
collected  over  real  telephone  lines  and  run  through  the 
test-bed  simulation  with  the  three  different  algorithms. 
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The  conversation  was  planned  so  that  the  far-end  speaker  was 
silent  during  the  beginning  part  of  the  conversation.  The 
output  at  the  near-end  side  at  the  beginning  of  the 
conversation  is  given  in  Figure  27,  for  different 
conditions.  Since  the  far-end  speaker  was  silent  at  the 
beginning  of  the  conversation  in  Figure  27  we  observe  only 
the  returned  echo  at  the  near-end  side.  In  Figure  27c  we 
observe  the  faster  convergence  of  the  unconstrained 
frequency  domain  algorithm  compared  to  the  LMS  Figure  27d 
and  gradient  lattice  algorithm  Figure  27e.  In  this  case  the 
UFLMS  achieved  43  dB  echo  cancellation  after  250  msec 
compared  to  only  30  dB  for  the  LMS  algorithm  after  250  msec 
under  the  same  conditions.  However,  both  algorithms 
converge  to  the  same  final  echo  cancellation  of  50  dB. 

With  the  lattice  algorithm,  the  echo  cancellation 


obtained 

was  33 

dB  in  250 

msec 

but  the 

final 

echo 

cancellation  was 

40  dB.  The 

poor 

performance  of 

the 

gradient 

lattice 

algorithm  can  be 

seen  in 

Figure 

27d. 

Faster  convergence  for  the  gradient  lattice  algorithm  can  be 
achieved  with  higher  value  for  the  convergence  constants  but 
serious  distortion,  such  as  transient  spikes,  was 
introduced.  The  poor  performance  of  the  gradient  lattice 
technique  is  due  to  the  non-stationary  behavior  of  speech. 

This  conclusion  is  confirmed  on  large  amounts  of  speech 
data.  For  a  short  segment  of  speech  fast  convergence  can  be 
achieved  with  the  gradient  lattice  algorithm,  but  for  long  j 
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speech  data,  because  of  the  speech  non-stationar ity ,  the 
convergence  constants  must  be  chosen  small  enough  to  avoid 
unstable  behavior,  or  else  the  performance  of  the  algorithm 
may  become  very  poor. 

A  conclusion  from  this  experiment  is  that  the  UFLMS 
achieves  best  performance  in  both  fast  convergence  and  final 
amount  of  echo  reduction  after  convergence.  The  lattice 
algorithm  has  faster  convergence  than  the  LMS  algorithm  but 
because  of  speech  non-stationar ity  the  algorithm  does  not 
converge  to  the  optimal  solution. 

6.3.4  Experiment  4:  Performance  of  echo  canceller 

with  an  LPC  vocoder  in  the  loop. 

The  main  purpose  of  this  experiment  was  to  check  the 
interaction  between  the  LPC  vocoder  and  the  echo  cancelling 
algorithm.  In  this  experiment  we  added  the  LPC  analyzer  and 
synthesizer  to  the  test-bed  simulation  set-up  which  was  used 
in  the  last  experiment. 

The  last  experiment  has  been  repeated  with  the  vocoder 
in  the  loop.  The  results  are  given  in  Figure  28. 
Figure  28a  shows  the  near-end  output  in  a  free  echo 
condition;  in  fact,  since  there  is  no  echo  this  is  the 
information  sent  by  the  far-end  speaker,  and  we  see  that  the 
far-end  talker  is  silent  at  the  beginning  of  the 
conversation.  Figure  28b  shows  the  output  at  the  near-end 
side  when  the  system  includes  the  transhybrid  response  but 
not  the  echo  canceller.  Since  the  far-end  talker  is  silent 
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at  the  beginning  of  the  conversation,  Figure  29b  presents 
the  returned  echo  at  the  near-end  side  without  the  echo 
canceller.  Figure  28c  presents  the  near-end  output  when  the 
LMS  algorithm  was  used  as  the  echo  canceller.  Figure  28d 
presents  the  near-end  output  when  the  frequency  domain 
algorithm  was  used  as  the  echo  canceller;  from  this  result 
we  observe  the  faster  convergence  of  the  UFLMS  compared  to 
the  Widrow  algorithm.  Figure  28e  presents  the  near-end 
output  when  the  gradient  lattice  algorithm  was  used  as  the 
echo  canceller  algorithm.  From  Figure  28e  we  observe  the 
poor  performance  of  the  gradient  lattice  algorithm.  The 
interesting  result  was  that  with  vocoder  in  the  loop  the 
gradient  lattice  performs  worse  than  without  the  vocoder  in 
the  loop.  Without  the  vocoder  in  the  loop  we  could  find  a 
convergent  constant  such  that  for  a  conversation  of  30  sec 
the  adaptive  filter  was  stable.  With  the  vocoder  in  the 
loop  we  could  not  find  such  convergence  constants.  With 
every  convergence  constant  some  instability  occurs  in  one 
place  or  another.  Only  with  very  low  convergence  constants 
was  the  adaptive  filter  stable;  but  then  we  had  very  poor 
echo  cancelling.  It  appears  that  the  main  reason  for  this 
behavior  of  the  gradient  lattice  algorithm  is  that  the  LPC 
synthesizer  output  is  much  more  nonstationary  than  the 
original  speech;  the  abrupt  change  of  the  prediction 
coefficient  every  frame  introduced  spikes  at  the  gradient 
lattice  output  since  the  lattice  algorithm  is  designed  for 
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stationary  input. 

In  Figures  29a  to  29e  we  have  the  same  output  at  the 


far-end 

output, 

we 

see 

that 

the  results 

at  the  far-end 

output 

behaves 

the 

same. 

The 

main  result  from  this 

experiment  was 

that 

with 

the 

vocoder  in 

the  loop,  the 

frequency  domain  algorithm  has  better  performance  than  the 
LMS  algorithm  and  that  the  lattice  algorithm  cannot  be  used 
at  all  in  our  application. 

6.3.5  Experiment  5:  Performance  of  echo  cancellers  with 

time-variant  transhybrid  response. 

In  the  experiments  presented  so  far  the  transhybrid 
response  was  fixed  during  all  the  experiments.  As  explained 
earlier  the  transhybrid  response  library  includes  14 
different  loading  conditions.  In  the  test-bed  simulation 
program  there  is  a  "software  switch"  for  time  variant 
transhybrid  response.  By  activating  this  switch  the  user 
can  specify  a  time  interval  T  so  that  every  T  seconds  the 
transhybrid  response  is  changed  successively  from  the 
library. 

In  this  experiment  we  use  the  time  variable  switch  with 
T  ■  1  sec  and  run  a  conversation  of  50  sec  through  the 
test-bed  simulation  with  the  LMS  and  the  UFLMS  algorithms. 
The  results  of  this  experiment  are  given  in  Figure  30-33. 

Figure  30  and  31  show  the  near-end  and  far-end  outputs 
with  the  LMS  algorithm.  Figure  30  presents  the  results  with 
constant  transhybrid  response,  Figure  31  presents  the 


Output  at  site  *\  and  0  -  L21S  algorithm  with  dynamic  transhybrid 

response. 

(Changes  every  1  sec— the  display  is  10  sec) 
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results  with  the  dynamic  transhybrid.  It  is  clear  from 
those  results  that  for  such  conditions  the  LMS  algorithm  did 
not  achieve  good  echo  reduction.  From  those  figures  we  can 
observe  the  high  echo  in  Figure  31  compared  to  the  low  echo 
in  Figure  30  after  convergence. 

Those  results  were  obtained  with  a  convergence  factor 
of  u  *  0.1+10  7 ,  which  was  the  highest  convergence  constant 
possible  without  a  stability  problem.  An  interesting  point 
to  note  here  is  that  with  constant  transhybrid  response  the 
highest  convergent  constant  was  *  1.1+10-6;  this  means  that 
the  time-variance  of  the  transhybrid  response  forces  us  to 
use  a  lower  convergence  constant  which  means  slow  time 
convergence. 

Figure  32  and  33  shows  the  same  output  for  the 
frequency  domain  algorithm.  Figure  32  shows  the  near  and 
far-end  output  with  constant  transhybrid  response.  Figure  33 
shows  the  same  outputs  with  time-variable  transhybrid 
response.  From  these  results  we  see  that  the  frequency 
domain  algorithm,  because  of  its  fast  convergence,  has 
almost  the  same  performance  for  time  variable  transhybrid 
response.  An  important  point  to  notice  is  that  in  both 
cases  we  used  the  same  convergence  constants  (0.2  for  the 
convergence  constant  and  0.9  for  the  power  smoothing 
constant) . 

In  summary,  conversation  of  50  sec  length  has  been 
passed  through  the  full  duplex  channel  with  time  variant 
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transhybrid  response  with  both  the  LMS  and  the  UFLMS 
algorithms.  The  test-bed  simulation  outputs  at  both  ends  of 
the  full-duplex  channel  has  been  recorded.  From  listening 
to  those  outputs  the  superior  behavior  of  the  frequency 
domain  algorithm  is  clear.  This  conversation  contains  a 
number  of  situations  where  both  speakers  talk 
simultaneously.  In  these  situations  we  can  observe  the 
distortion  introduced  with  the  LMS  algorithm  compared  to  the 
good  speech  quality  achieved  by  the  UFLMS  algorithm. 


.  •» 


113 


Summary 

During  this  program  a  full-scale  testbed  simulation  of 
the  interfacing  of  a  telephone  to  the  wideband  integrated 
network  was  completed.  This  simulation  includes  the  use  of 
different  types  of  echo  cancelling  algorithms  and  an  LPC 
vocoder,  and  allows  for  other  studies  to  be  carried  on. 

Echo  cancelling  algorithms  were  studied  with  a  number 
of  interesting  conclusions.  Because  of  the  non-stationarity 
of  the  speech  signals,  made  more  so  by  the  vocoder,  standard 
LMS  algorithms  2nd  lattice  techniques  are  not  adequate 
because  of  their  convergence  properties.  With  the  vocoder 
in  the  loop,  convergence  must  be  faster  than  without  the 
vocoder,  because  synthetic  speech  signals  are  not  so  rich  in 
components  as  are  the  actual  speech  signals.  An 
unconstrained  frequency  domain  adaptive  filter  algorithm  was 
the  most  effective. 

The  echo  cancelling  is  limited  by  the  nonlinearities  of 
the  system.  It  was  found  tht  a  nonlinear  adaptive  filter 
could  reduce  the  signal-to-noise  ratio  by  a  few  more  dB  when 
used  with  a  stationary  system  (the  hybrid  and  line 
characteristics  remain  fixed) .  Further  study  would  be 
needed  before  such  a  nonlinear  adaptive  filter  should  be 
used  in  a  dynamic  situation. 

The  possible  use  of  an  AGC  in  the  system  was  studied. 
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and  it  was  concluded  that  the  AGC  would  cause  more  problems 
than  it  would  help.  The  dynamic  range  problem  of  the  analog 
signals  is  greatly  reduced  by  improving  the  pitch  and 
voicing  detection  algorithms. 

The  next  step  should  be  the  breadboarding  of  a  real 
time  system  to  be  tested  in  realistic  situations  which  might 
show  up  problems  not  seen  in  a  simulation. 
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